0% found this document useful (0 votes)

58 views

An Attention-Aided Deep Learning Framework For Massive MIMO Channel Estimation

1) The document proposes an attention-aided deep learning framework for channel estimation in massive MIMO systems. 2) An attention mechanism is introduced to improve estimation accuracy for highly separable channels with narrow angular spread by realizing a "divide-and-conquer" strategy. 3) The attention mechanism segments the angular space into different regions and estimates channels for each region separately, improving performance over conventional techniques.

Uploaded by

郭zl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

An Attention-Aided Deep Learning Framework For Massive MIMO Channel Estimation

Uploaded by

郭zl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO.

3, MARCH 2022 1823

An Attention-Aided Deep Learning Framework

for Massive MIMO Channel Estimation
Jiabao Gao, Student Member, IEEE, Mu Hu, Graduate Student Member, IEEE,
Caijun Zhong , Senior Member, IEEE, Geoffrey Ye Li , Fellow, IEEE,
and Zhaoyang Zhang , Senior Member, IEEE

Abstract— Channel estimation is one of the key issues in In the prior works, least square (LS) and minimal
practical massive multiple-input multiple-output (MIMO) sys- mean-squared error (MMSE) [3] are two most commonly used
tems. Compared with conventional estimation algorithms, deep estimators for channel estimation. The LS is relatively simple
learning (DL) based ones have exhibited great potential in terms
of performance and complexity. In this paper, an attention and easy to implement while its performance is unsatisfactory.
mechanism, exploiting the channel distribution characteristics, On the other hand, MMSE can refine the LS estimation
is proposed to improve the estimation accuracy of highly sep- if accurate channel correlation matrix (CCM) is available.
arable channels with narrow angular spread by realizing the However, the complexity of MMSE estimation is much higher
“divide-and-conquer” policy. Specifically, we introduce a novel than that of LS estimation due to the matrix inversion oper-
attention-aided DL channel estimation framework for conven-
tional massive MIMO systems and devise an embedding method ation. On the other hand, to reduce the hardware and energy
to effectively integrate the attention mechanism into the fully cost, the hybrid analog-digital (HAD) architecture is usu-
connected neural network for the hybrid analog-digital (HAD) ally adopted in practical massive MIMO systems, where the
architecture. Simulation results show that in both scenarios, multi-antenna array is connected to only a limited number of
the channel estimation performance is significantly improved with radio-frequency (RF) chains through phase shifters in analog
the aid of attention at the cost of small complexity overhead.
Furthermore, strong robustness under different system and domain [4]–[6]. With HAD, channel estimation becomes even
channel parameters can be achieved by the proposed approach, more difficult since the received signals at the BS are only
which further strengthens its practical value. We also investigate a few linear combinations of the original signals. If LS is
the distributions of learned attention maps to reveal the role of used, multiple estimations are required since only part of
attention, which endows the proposed approach with a certain the antennas’ channels can be estimated once due to limited
degree of interpretability.
number of RF chains. To avoid the dramatically increased
Index Terms— Massive MIMO, channel estimation, overhead of LS, the slowly changing directions of arrival of
deep learning, attention mechanism, hybrid analog-digital, channel paths are obtained first in the preamble stage in [7],
divide-and-conquer.
then only channel gains of each path are re-estimated in
I. I NTRODUCTION
a long period. Another alternative is to exploit the channel

M ASSIVE multiple-input multiple-output (MIMO) is a

key enabling technology for future wireless commu-
nication systems due to its high spectral and energy effi-
sparsity and estimate all antennas’ channels at once using the
compressed sensing (CS) based methods, such as orthogo-
nal matching pursuit [8] and sparse Bayesian learning [9].
ciency [1], [2]. However, the realization of various theoretical In [10], [11], several improved CS algorithms have been
gains of massive MIMO is critically dependent on the quality developed through embedding the structural characteristics of
of channel state information (CSI). Because of the large channel sparsity, which can achieve better estimation perfor-
number of antennas and users, the CSI acquisition has long mance without extra pilot overhead. Nevertheless, CS algo-
been a major challenge in practical massive MIMO systems. rithms require high computational complexity and perform
Manuscript received February 4, 2021; revised May 16, 2021; accepted poor for channels with low sparsity. Therefore, it is highly
August 19, 2021. Date of publication August 31, 2021; date of current version desirable to develop channel estimators with less require-
March 10, 2022. This work was supported in part by the National Key ment for prior information and better performance-complexity
Research and Development Program of China under Grant 2018YFB1801104,
in part by Zhejiang Provincial Natural Science Foundation of China under trade-off.
Grant LD21F010001, and in part by the National Natural Science Foundation Inspired by the great performance and the low complex-
of China under Grant 61922071. The associate editor coordinating the ity during online prediction, deep learning (DL) has been
review of this article and approving it for publication was T. Q. S. Quek.
(Corresponding author: Caijun Zhong.) applied to many wireless communication problems [12], [13],
Jiabao Gao, Mu Hu, Caijun Zhong, and Zhaoyang Zhang are with the such as spectrum sensing [14], resource management
Institute of Information and Communication Engineering, Zhejiang University, [15]–[18], beamforming [19], [20], signal detection [21]–[23],
Hangzhou 310027, China (e-mail: [email protected]).
Geoffrey Ye Li is with the Department of Electrical and Electronic Engi- and channel estimation [24]–[32]. By exploiting the struc-
neering, Faculty of Engineering, Imperial College London, London SW7 2BX, tural characteristics of the modulated signals, the customized
U.K. (e-mail: [email protected]). deep neural network (DNN) in [14] significantly outperforms
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TWC.2021.3107452. energy detection in spectrum sensing. In [15], a DNN has
Digital Object Identifier 10.1109/TWC.2021.3107452 been proposed for resource management, which can achieve
1536-1276 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on May 23,2022 at 02:17:20 UTC from IEEE Xplore. Restrictions apply.
1824 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022

comparable performance as the iterative optimization algo- other if the entire angular space is properly segmented into
rithm. An unsupervised learning-based beamforming network different angular regions. Under such a condition, the classic
has been developed for intelligent reconfigurable surface aided “divide-and-conquer” policy, which tackles a complex main
massive MIMO systems in [19]. In [21], channel estimation problem by solving a series of its simplified sub-problems,
and signal detection in orthogonal frequency division multi- is very suitable. Specifically, the estimation of channels in the
plexing systems have been performed jointly by a DNN. Then, entire angular space can be regarded as the main problem
a model-driven based approach is further proposed in [22] to and the estimation of channels in different small angular
exploit the advantages of both conventional algorithms and regions can be regarded as different sub-problems. Motivated
DNN. In [23], rather than directly using a black-box DNN, by this, in this paper, we propose a novel attention-aided DL-
the conventional orthogonal approximate message passing based channel estimation framework, where the “divide-and-
algorithm (OAMP) is unfolded for the detection network. conquer” policy is realized automatically through the dynamic
There are mainly two categories of approaches for DL-based adaptation of attention maps. The main contributions of this
massive MIMO channel estimation. In the first category, “deep paper are summarized as follows:
unfolding” methods unfold various iterative optimization algo- • An attention-aided DL-based channel estimation frame-
rithms and enhance their estimation performance by inserting work is proposed for massive MIMO systems, which
learnable parameters. In [24], the AMP algorithm is unfolded achieves better performance than its counterpart without
into a cascaded neural network for millimeter wave channel attention in simulation. To the best knowledge of the
estimation, where the denoiser is learned by a DNN. Thanks to authors, this is the first work that introduces the attention
the power of DL, the proposed method can outperform a series mechanism to DL-based channel estimation.1
of conventional denoising-AMP based algorithms. In [25], • We extend the above framework to the scenario with
the iterative shrinkage thresholding algorithm is unfolded to HAD and an embedding method is proposed to effectively
solve sparse linear inverse problems, where massive MIMO integrate the attention mechanism into the fully connected
channel estimation is used as a case study. However, “unfold- neural network (FNN), which expands the application
ing” is only feasible to the iterative algorithms with simple range of the proposed approach.
structures, and the computational complexity is also high. • We visually explain the “divide-and-conquer” policy
In the other category, DL is used to directly learn the mapping reflected in the distributions of learned attention maps,
from available channel-related information to the CSI for which enhances the interpretability and rationality of the
performance improvement or complexity reduction. In [26], proposed approach.
a DNN has been proposed to refine the coarse estimation in • Based on our results, the performance gain of attention
HAD massive MIMO systems, where the channel correlation mainly comes from the narrow angular spread charac-
in the frequency and time domains is exploited for further teristic of channels. Therefore, the proposed approach
performance improvement. In [28], the estimation performance can be extended to many other problems apart from
is further improved by jointly training the pilot signals and channel estimation as long as the channel distribution
channel estimator with an autoencoder in downlink massive has certain separability, such as multi-user beamforming,
MIMO systems. In [29], graph neural network has been used FDD downlink channel prediction, and so forth.
for massive MIMO channel tracking. Deep multimodal learn-
ing has been used for massive MIMO channel estimation and The rest of this paper is organized as follows. Section II
prediction in [30]. To reduce the complexity, the amplitudes introduces the system model, channel model, and problem
of beamspace channels are predicted by a DNN and the formulation. Section III presents the attention-aided DL-based
dominant entries are estimated by LS in [31], thus avoiding the channel estimation framework, which is extended to the HAD
greedy search commonly adopted by CS algorithms. In [32], scenario in Section IV. Simulation results are demonstrated in
the uplink-to-downlink channel mapping in frequency-division Section V. Eventually, the paper is concluded in Section VI.
duplex (FDD) systems is learned by a sparse complex valued Here are some notations used subsequently. We use italic,
network. bold-face lower-case and bold-face upper-case letter to denote
Nevertheless, current DL-based channel estimation methods scalar, vector, and matrix, respectively. AT and AH denote
have seldom exploited the characteristics of channel distribu- the transpose and Hermitian or complex conjugate transpose
tion. In practice, the BS is often located in a high altitude of matrix A, respectively. [A]i,j denotes the element at the
with few surrounding scatters [33], so the angular spread of i-th row and j-th column of matrix A. x denotes the
each user’s incident signal at the BS is narrow. Thus, the global l-2 norm of vector x, and |a| denotes the amplitude of
distribution of channels corresponding to different users in the complex number a. Cx×y denotes the x × y complex space.
entire angular space can be viewed as the composition of many CN (μ, σ 2 ) denotes the distribution of a circularly symmetric
local distributions, where each local distribution represents
channels within a small angular region. Due to narrow angular 1 Attention has already been used in some literature to aid DL-based
spread, a certain angular region contains much fewer channel communication systems, such as CSI compression [34], [35] and joint source
cases than the entire angular space because of the limited and channel coding [36]. Nevertheless, the considered channel distribution
angular range of channel paths, making the local distributions in [34] does not possess strong separable property, and the proposed method
in [36] requires extra side information. As for [35], the non-local neural
much simpler than the global distribution. Besides, different network model is utilized to exploit the self-attention in the spatial dimension
local distributions can be highly distinguishable from each of channels.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on May 23,2022 at 02:17:20 UTC from IEEE Xplore. Restrictions apply.
GAO et al.: ATTENTION-AIDED DL FRAMEWORK FOR MASSIVE MIMO CHANNEL ESTIMATION 1825

Fig. 1. Massive MIMO system without HAD.

complex Gaussian random variable with mean μ and variance transform matrix [11], with the n-th row given by
σ 2 . U[a, b] denotes the uniform distribution between a and b. fn = √1 [1, e−jπηn , · · · , e−jπηn (N −1) ], for ηn =
N
−N +1 −N +3
N , N ,··· , NN−1 . Due to narrow angular spread
II. S YSTEM M ODEL AND P ROBLEM F ORMULATION assumption, the angular domain channel exhibits the
In this section, system model and channel model are first spatial-clustered sparsity structure [11]. Specifically, as shown
introduced. Then, the conventional massive MIMO channel in the right half of Fig. 1, xk only has a few significant
estimation problem is formulated. elements appearing in a cluster. If properly exploited, such
sparsity structure can help to improve estimation performance
A. System Model and reduce estimation overhead.
Consider a single cell massive MIMO system, where the BS
is equipped with an N -antenna uniform linear array (ULA) and
C. Problem Formulation
K single-antenna users are randomly distributed in the cell of
the corresponding BS, as illustrated in Fig. 1. During the uplink training, orthogonal pilot sequences are
sent by different users. Denote the pilot sequence of the k-th
B. Channel Model user as pk ∈ C1×Lp , where Lp ≥ K is the length of pilot
sequences. Notice that the channel during pilot training phase
Following the same channel model as in [37], the uplink
is assumed to be unchanged [11] since Lp is relatively small.
channel from user k to the BS can be expressed as
Therefore, the superimposed received signal at the BS can be
Np
1 expressed as
hk = αki a(θki ) ∈ CN ×1 , (1)
Np i=1 K

where Np is the number of paths, αki and θki are the complex Y = hk pk + N ∈ CN ×Lp , (3)
k=1
gain and angle of arrival (AoA) at the BS of the i-th path
from the k-th user, respectively. Without loss of generality, where N ∼ CN (0, σ 2 ) ∈ CN ×Lp is the zero-mean additive
we consider half-wavelength antenna spacing in this paper, white Gaussian noise at the BS with variance σ 2 . Without
then the steering vector of the ULA can be written as a(θ) = loss of generality, we fix the power of pilot sequences to
[1, ejπ sin(θ) , · · · , ejπ sin(θ)(N −1) ]T . Define the average AoA unit and adjust the transmit signal-to-noise ratio (SNR) by
and the angular spread of user k’s channel paths as θ̄k and changing the noise variance. Then, we have pi pHj = 0, ∀i = j
θ , respectively, that is, θki follows a uniform distribution and pi pH = 1, ∀i. Exploiting the orthogonality of the pilot
i
U[θ̄k − θ , θ̄k + θ ]. As in [11], [37], the narrow angular sequences, the LS estimation of user k’s channel can be
spread assumption is adopted, i.e., θ π. obtained as
To better understand this channel characteristic, we convert
the original channel to the angular domain by ĥk = Y pH k ∈ CN ×1 ,
k = hk + n (4)
N ×1
xk = F hk ∈ C , (2)
where n k N pH k is the effective noise for user k. For
where xk denotes the angular domain channel of user brevity, we will consider a specific user from now on and omit
k, and F ∈ CN ×N is a shift-version discrete Fourier subscript k. Besides, we use ĥLS to denote the LS estimation.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on May 23,2022 at 02:17:20 UTC from IEEE Xplore. Restrictions apply.
1826 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022

output feature matrix, which is also the input of the next layer.
Specifically, each filter contains a (L, C)-dimensional trainable
weight matrix and a scalar bias term, where L denotes the filter
size. When a filter is located in a certain position of the feature
matrix, the cross-correlation between the corresponding chunk
of the feature matrix and the weight matrix of the filter is
computed and the bias is added to obtain the convolution
Fig. 2. Structure of the channel estimation network. output of the position [39]. In the proposed channel estimation
network, NB convolutional blocks and an output Conv1D
layer are used to refine the LS coarse channel estimation.
Therefore, the goal of channel estimation2 is to find a function
As depicted in the dashed box, in each convolutional block,
that maps ĥLS to h.
a batch normalization (BN) layer to prevent gradient explosion
One of the conventional methods is the MMSE estimation,
or vanishing [43] and a ReLU activation function are inserted
where the LS estimation is refined by the CCM. However,
after the Conv1D layer. Besides, the Conv1D layer in the first
accurate CCM is hard to obtain in practice and the complexity
block has F filters of size LI and the Conv1D layers in the
of matrix inversion in MMSE estimation is very high, espe-
next NB − 1 blocks have F filters of size LH . The optimal
cially when the antenna number is large. In [38], DL-based
values of NB and F can be determined through simulation.
methods have been proposed to refine the channel estimation.
Finally, the output Conv1D layer has 2 filters of size LO ,
In this paper, we will develop an attention-aided DL frame-
corresponding to the real and imaginary parts of the channel
work for conventional massive MIMO channel estimation by
prediction, respectively. The stride is set to S and all the
exploiting the characteristics of channel distribution.
Conv1D layers pad zeros to keep the dimension N of the
feature matrix unchanged.
III. ATTENTION -A IDED DL F RAMEWORK FOR M ASSIVE To effectively exploit the distribution characteristics of
MIMO C HANNEL E STIMATION channel, the attention mechanism4 is applied in the network
In this section, input and output processing, network struc- structure design. In the original CNN, all the features are used
ture design, and detailed network training method of the for all data samples with equal importance. However, certain
proposed framework are introduced. features can definitely be more important or informative than
others to certain data samples in practice, especially for
A. Input and Output Processing highly separable data like narrow angular spread channel.
For instance, key features, which are only aimed at dealing
Since channel parameters can be canonically expressed in
with channel distribution in a specific angular region, might
the angular domain, the input and output of the networks
be useless or even disruptive for the estimation of channels
are all in the angular domain in the proposed framework.
in another region far apart. Therefore, the idea of feature
In simulation, we find that the more sparse angular domain
importance reweighting can be used here to improve network
input and output can lead to better channel estimation per-
performance.
formance than the original ones. Once the angular domain
As is demonstrated in Fig. 3, the original feature matrix is
channel estimation, x̂, is obtained, the original channel esti-
multiplied by an attention map in a channel-wise manner to
mation can be readily recovered by ĥ = F H x̂. Besides,
obtain the reweighted feature matrix in the attention module,
the real and imaginary parts have to be separately processed
where more important or informative features to the current
since complex training is still not well supported by current
data sample will be paid more “attention” to. For the learning
DL libraries. To promote efficient training, we also perform
process of the attention map, global average pooling is per-
standard normalization preprocessing on the input.
formed first on the original feature matrix, Z O , to embed the
global information into a (1, C)-dimensional squeezed feature
B. Attention-Aided Channel Estimation Network Structure matrix, z.FSpecifically, the c-th element of z is calculated
As shown in Fig. 2, convolutional neural network (CNN) by zc = f =1 [Z O ]f,c /F [40]. Then, the (1, C)-dimensional
is a suitable choice for the network structure to exploit the attention map, m, is predicted by a dedicated attention net-
local correlation in the input data due to the spatial-clustered work based on z. The attention network contains two fully
sparsity structure of the angular domain channel. In this paper, connected (FC) layers. The first FC layer with C/r neurons
one-dimensional convolution (Conv1D) is used due to the is followed by a ReLU activation, fReLU (x) = max(0, x),
shape of input data. The input of a Conv1D layer is organized where r ≥ 1 denotes the reduction ratio. The second FC
as a (F, C)-dimensional feature matrix, where C denotes the layer with C neurons is followed by a Sigmoid activation,
number of channels3 and F denotes the number of features in fSigmoid (x) = 1/(1 + e−x ), which limits the elements of m
each channel. Then, the convolution operation slides C filters between 0 and 1. As can be seen in Fig. 2, an attention
over the input feature matrix in certain strides to obtain the module is inserted at the end of each convolutional block in
the proposed channel estimation network. Besides, r is set to
2 Here we use the term “channel estimation” for consistency, actually
“channel refinement” is more proper.
3 Here channel is a term in CNN representing a dimension of feature matrix, 4 Notice that, the term attention can refer to many related methods includ-
not the communication channel. ing [40]–[42]. In this paper, we use the classic “SENet” proposed in [40].

Fig. 3. Structure of the attention module.

2 to balance performance and complexity and the FC layers

in the attention network do not use bias to facilitate channel
dependency modeling.

C. Network Training
To train the designed network, the mean-squared
error (MSE) between the true angular domain channel,
x, and the predicted angular domain channel, x̂, is used as
the loss function, which can be calculated by
n
1 2
MSE Loss = x̂i − xi , (5)
n i=1
Fig. 4. Massive MIMO system with HAD.
where subscript i denotes the i-th data sample in a mini-batch
and n = 500 is the size of the mini-batch. Xavier [44] is
used as the weight initializer and Adam [45] is used as the
where W ∈ CM×N denotes the analog combining matrix.
weight optimizer. The initial learning rate is set to 0.001.
As the phase shifters
√ only change the phase of signals, we have
To balance the training complexity and testing performance,
|[W ]i,j | = 1/ N , ∀i, j after normalization. We set W to
we generate totally 200,000 data samples according to the
a matrix whose rows are length-N Zadoff-Chu sequences
adopted channel and transmission models. Then, the generated
with different shifting steps as in [10]. Again, exploiting
dataset is split into training, validation, and testing set with a
the orthogonality of the pilot sequences, the received signal
ratio of 3:1:1. In order to accelerate loss convergence at the
corresponding to user k can be obtained as
beginning and reduce loss oscillation near the end of training,
the learning rate is set to decay 10 times if the validation loss y k = Y HAD pH k ∈ CM×1 ,
k = W hk + n (7)
does not decrease in 10 consecutive epochs. Besides, early
stopping [46] with a patience of 25 epochs is applied to prevent where n k W n
k is the effective noise for user k with HAD.
overfitting and speed up the training process. Consider a specific user and omit the subscript k, the goal of
channel estimation now becomes to find a function that maps
IV. E XTENSION TO THE HAD S CENARIO y to h.
Since the overhead of LS estimation increases dramatically
In practice, the HAD architecture is often adopted in mas- due to limited number of RF chains, CS algorithms are more
sive MIMO systems to save hardware and energy cost. Due to often adopted to solve the channel estimation problem in HAD
the effect of phase shifters in the analog domain in the HAD massive MIMO systems conventionally. However, the per-
architecture, the problem formulation of channel estimation formance of CS algorithms is highly dependent on channel
changes and the channel estimation network structure has to be sparsity and the computational complexity is relatively high
customized correspondingly as well. In the HAD architecture, due to complex operations and a large number of iterations.
we assume there is only M N RF chains available at the Therefore, we extend the proposed framework to the HAD
BS, as illustrated in Fig. 4. scenario and use DL to overcome these issues.

A. Problem Reformulation With HAD

B. Attention-Aided Channel Estimation Network
With HAD, the signals arriving at the antennas have to Structure Design With HAD
go through the phase shifters first before received by the RF
Different from the former scenario, in the problem of
chains. So, the eventual received signal on the baseband can
channel estimation with HAD, the input data becomes the
be expressed as
received signal y, where little local correlation exists due to
Y HAD = W Y ∈ CM×Lp , (6) the compression of matrix W . Therefore, FNN should be used

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on May 23,2022 at 02:17:20 UTC from IEEE Xplore. Restrictions apply.
1828 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022

rather than CNN to achieve better performance. Although the TABLE I

attention mechanism has been originally proposed in the area S IMULATION PARAMETERS
of computer vision and is only compatible with CNN, its key
idea, feature importance reweighting, is actually independent
of network structure. Therefore, to exploit the benefit of
the attention mechanism, we propose a simple but effective
method to embed it into FNN.
As introduced earlier, the attention module is inserted after
a feature matrix and the attention map is learned from the
squeezed feature matrix obtained by global average pooling.
FNN cannot directly use attention since all the neurons of
the neighboring FC layers are fully connected and features of
FC layers appear in the form of vectors instead of matrices. scenario with HAD, the FLOPs of the attention-aided FNN
Therefore, as depicted in the dashed box in Fig. 5, we reshape can be obtained as F C(2M + 2N + 1) + C(C + 1). In this
the feature vector of a FC layer into a matrix first, like the paper, structured variational Bayesian inference (S-VBI) is
feature matrix of a Conv1D layer. Then, with the matrix- selected as the CS-based baseline algorithm, whose FLOPs
shaped feature, the original attention mechanism can be nor- is IE ( 23 M 3 + (2M + 2)N 2 ) with IE denoting the number
mally applied. Finally, the reweighted feature vector can be of iterations [11]. Again, for both algorithms, obtaining the
obtained by flattening the reweighted feature matrix. received signal corresponding to a single user requires 4M L2p
The detailed network design is illustrated in Fig. 5. The first FLOPs. Notice that in both scenarios, the FLOPs of DL-based
FC layer consists of F × C neurons, which is followed by a algorithms only scale linearly with N and M , which is an
ReLU activation and a BN layer. The feature vector is then attractive practical advantage, especially in large scale systems.
reshaped into a (F, C) feature matrix, where C and F can be By contrast, the FLOPs of conventional algorithms are much
regarded as the number of channels and the number of features higher and grow cubically with N and M .
of each channel, respectively. Based on the feature matrix, As for the total number of parameters, the Conv1D layer
the original attention module is inserted to get the reweighted and the l-th FC layer contains LCC and Nl−1 Nl parameters,
feature matrix, which is then flattened back to the reweighted respectively. Without HAD, CNN contains totally 2(LI +
feature vector. Eventually, an output FC layer with 2N neurons L0 )F + LH (NB − 1)F 2 parameters and the additional number
is used to obtain the real and imaginary parts of the angular of parameters of attention modules is NB F 2 . The CCM used
domain channel prediction. We only use one hidden FC layer in MMSE requires 2N 2 parameters. In the scenario with HAD,
here since experiments indicate that more hidden FC layers are attention-aided FNN contains totally F C(2M + 2N ) + C 2
not helpful to further improve the performance but increases parameters, while S-VBI does not need any parameters.
the complexity dramatically.
V. S IMULATION R ESULTS
C. Complexity Analysis In this section, extensive simulation results are presented to
evaluate the performance of the proposed DL-based channel
In this subsection, the complexity of various algorithms
estimation framework in scenarios with and without HAD.
are analyzed. Two metrics are used to measure the com-
MSE is adopted as the performance metric. Notice that,
plexity, namely the required number of floating point oper-
converting the channel to angular domain does not change
ations (FLOPs) and the total number of parameters. For
the MSE since F is a unitary matrix. Some of the parameters
brevity, only multiplication is considered and one complex
used in simulation are summarized in Table I, unless otherwise
multiplication is counted as four real multiplications when
specified. As for network hyper-parameters in the scenario
computing FLOPs, and the weights and biases of BN layers
without HAD, LI , LH , and LO are set to 7, 5, and 1,
are ignored and one complex parameter is counted as two
respectively, and S is set to 1.
real parameters when computing parameter number. When
We compare the proposed algorithm with the following
analyzing the complexity of neural networks, we ignore the
baseline algorithms. The structures of all DL-based baselines
offline training phase and focus on the online testing phase
are carefully determined by cross validation out of fairness.
since the network training only needs to be executed once and
Without HAD: The following algorithms are selected as
the BS usually has sufficient computational ability in practice.
baselines:
Using the notations in Section III, the FLOPs of the Conv1D
• MMSE Single: Refine the LS estimation by the CCM,
layer and the l-th FC layer are LF CC and Nl−1 Nl , respec-
tively, where Nl denotes the number of neurons in the l-th Rhh E(hhH ) ∈ CN ×N as [3]
FC layer. Without HAD, the overall FLOPs of CNN can be ĥMMSE = Rhh (Rhh + I/SNR)−1 ĥLS . (8)
readily obtained as (2LI F + 2LO F + LH F 2 (NB − 1))N
and the additional FLOPs of attention modules is NB F (N + • MMSE 3◦ : Split the entire angular space into many
F + 1). The FLOPs of MMSE estimation is 4(2N 3 + N 2 ). 3◦ -angular regions and estimate a dedicated CCM for
Besides, for both algorithms, the LS estimation has to be each region with only channel samples whose average
obtained first, which also requires 4N L2p FLOPs. In the AoAs are in the region. During the testing process of

Fig. 5. Channel estimation network structure with HAD.

a channel sample, the angular region it belongs to will A. Impacts of Network Parameters
be estimated first5 and the corresponding CCM will be To determine the best network structures for two scenarios,
selected for channel refinement. Compared with using we investigate the impacts of key network parameters on
a single CCM for all channel samples, using multiple network performance. Without HAD, the structure of CNN
CCMs matching different angular regions can effectively is mainly determined by the number of convolutional blocks,
exploit the narrow angular spread characteristic of chan- NB , and the number of filters of each Conv1D layer, F .
nels and improve performance significantly. Actually, As illustrated in Fig. 6(a), attention can improve the per-
it can be regarded as the manual implementation of the formance of CNNs with various numbers of convolutional
“divide-and-conquer” policy, i.e., the channel samples are blocks and filters and the performance of a two-layer attention-
“divided” by their angular regions and “conquered” by aided CNN is even better than a four-layer CNN without
different corresponding CCMs. attention, which indicates the superiority of the attention
• FNN: The FNN structure consists of three FC layers mechanism. In general, the performance of networks is better
with 512, 1024, and 256 neurons, respectively, with one with stronger representation capability brought by more convo-
BN layer inserted between every two FC layers. The lutional blocks. However, with enough filters, the performance
activation function of the first two FC layers is ReLU improvement of attention-aided CNN is marginal if the number
while the last FC layer does not use activation. of filters keeps growing and it can even be harmful to CNN
• CNN Without Attention: The same CNN structure but without attention sometimes. Besides, deeper and wider CNNs
with all the attention modules removed. also have heavier computing and storage burdens. To strike a
With HAD: The following algorithms are selected as balance between performance and complexity, we choose to
baselines: use four convolutional blocks and 96 filters for each Conv1D
• Separate LS: A total of N/M estimates are executed. layer.
In each estimate, only M antennas are switched on by With HAD, the structure of the attention-aided FNN is
adjusting W , and their channels are obtained by LS mainly determined by the number of neurons of the hidden
estimation [7]. FC layer F × C and the way of reshaping in the attention
• S-VBI: One of the state-of-the-art CS-based algorithms embedding module. As in Fig. 6(b), the network performs best
designed for narrow angular spread channel estima- when F × C = 3072 and the performance will deteriorate with
tion in HAD massive MIMO systems, where the either too few or many neurons. Besides, as can be indicated
spatial-clustered channel sparsity is embedded to improve from the bowl shape of curves, a medium number of features
the estimation performance [11]. The source code is in each channel performs best when F × C is fixed. The
provided by the authors of [11]. reason is that the number of channels is too small and there
• FNN Without Attention: Adopt the same structure of FNN is not enough degrees of freedom for dynamic adjustment
as in the former scenario while the number of neurons of attention maps when F is too large, while each channel
reduces to 256, 512, and 256, respectively, with smaller does not contain enough features to effectively capture the
input dimension. global information [40] when F is too small. So, we choose
• CNN: The structure of CNN is also similar to the former to reshape the feature vector into 192 channels with 16 features
scenario, except that the output layer is changed from in each channel.
Conv1D to FC for dimension conversion.
• CNN Without Attention: The same CNN structure but B. Impacts of System Parameters
with all the attention modules removed.
In this subsection, the impacts of various system parameters
5 The angular region estimations of samples are assumed to be accurate for are investigated to validate the superiority and universality of
simplicity. the proposed approach.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on May 23,2022 at 02:17:20 UTC from IEEE Xplore. Restrictions apply.
1830 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022

Fig. 7. Impact of SNR in the two considered scenarios.

Fig. 6. Impact of network parameters in the two considered scenarios.

compared with Fig. 7(a), the performance gain of the attention

1) Impact of SNR: As illustrated in Fig. 7(a), without HAD, mechanism is much more significant since the attention mech-
all DL-based methods can refine and improve the channel anism can not only help denoise but also plays an important
quality of LS coarse estimation. The performance improve- role in reversing the effect of W in the HAD scenario.
ment of FNN decreases as the SNR increases while CNN Specifically, when restoring the high-dimensional channel
outperforms LS significantly in various SNR regimes thanks from the low-dimensional received signal, the performance
to the exploitation of local correlation of input data. Then, deterioration can be effectively reduced if the approximate
with the aid of attention, the MSE of CNN further decreases AoA range of channel paths is known. Thanks to the attention
moderately. Besides, the performance gain of the attention mechanism, such processing can be automatically realized by
mechanism increases with SNR. When SNR is 0 dB, the MSE the dynamic adjustment of attention maps.
of CNN with attention is 89.55% of that of CNN without atten- 2) Impact of Angular Spread: As is illustrated in Fig. 8,
tion while this ratio decreases to 71.83% when SNR is 20 dB. attention-aided CNN has close performance to MMSE 3◦
The reason is that the narrow angular spread characteristic of and consistently outperforms LS significantly with various
the angular domain channel is more exposed and easier to angular spreads. As angular spread increases, the performance
be exploited with less noise, thereby amplifying the benefits of all algorithms decreases in both scenarios since the channel
of attention. As for MMSE, the performance improvement of estimation problem becomes more complex with less sparse
MMSE Single is marginal while MMSE 3◦ performs much angular domain channel. Besides, the performance gain of
better due to the exploitation of the narrow angular spread attention also decreases because the channel distribution is less
characteristic of channel. Nevertheless, the proposed attention- separable, which makes the attention mechanism more difficult
aided CNN still slightly outperforms MMSE 3◦ , demonstrating to realize the “divide-and-conquer” policy. In the scenario with
its superiority. HAD, the performance of attention-aided FNN is better than
From Fig. 7(b), the performance of FNN is much better separate LS unless the angular spread is too large while only
than CNN and outperforms separate LS except in high SNR M/N resource overhead is required.
regimes when HAD is considered and attention is not used, 3) Impacts of Antenna Number and RF Chain Ratio: As can
but it is still obviously inferior to S-VBI. However, with the be observed from Fig. 9, in both scenarios, the performance
aid of attention, the performance of both CNN and FNN of all algorithms improves as N increases. Since the power
improves significantly. As can be observed, attention-aided leakage of angular domain channel is inversely proportional
CNN outperforms S-VBI except when SNR is higher than to the antenna number [37], the increased channel sparsity
15 dB while the attention-aided FNN is even better and caused by more antennas can simplify channel estimation.
outperforms S-VBI consistently in all SNR regimes. Besides, Without HAD, attention-aided CNN has close performance

Fig. 8. Impact of angular spread in the two considered scenarios. Fig. 9. Impact of antenna number and RF chain ratio in the two considered
scenarios.

the input and output distributions of the network. For system

to MMSE 3◦ and the performance gain of attention can be parameters, the numbers of antennas and RF chains are usually
amplified by sparser channel. With HAD, the performance of fixed in practice, and different user numbers can also be
all algorithms improve as the RF chain ratio M/N increases handled by the same network since a multi-user channel
since more information is kept during the sensing phase. estimation problem is decomposed into multiple single-user
Besides, attention-aided FNN outperforms S-VBI consistently problems by exploiting the orthogonality of pilot sequences.
with various M and N and the performance gap increases Therefore, we focus on the generalization performance of
with less antennas with fixed RF chain ratio, indicating that channel parameters.
the DL-based approach is less dependent on channel spar- The generalization to different SNRs is illustrated in Fig. 10.
sity. From the perspective of resource saving, attention-aided The legend “trained with accurate SNRs” denotes that for
FNN is also superior to S-VBI. In particular, the MSE of each SNR, a dedicated model trained with accurate SNR data
attention-aided FNN with only 1/4 RF chains is comparable is used for testing. In both scenarios, the proposed networks
to that of S-VBI with 1/2 RF chains. As a result, the hardware can only handle tiny SNR mismatch between the training and
and energy cost can be halved. Furthermore, given strict target testing phases when the model is trained with a single SNR
MSE performance and a limited number of RF chains, S-VBI point and the performance degradation can be very severe
may need to estimate multiple times while attention-aided when the SNR mismatch is large. To alleviate this issue,
FNN completes the estimation at once, saving more resources one common method is training with data under a variety
for data transmission. Such an advantage can be very appealing of SNRs, then the characteristics of different SNRs can be
in scenarios like high-mobility communication, where the captured by a single network. In simulation, we select five
channel is fast time-varying with short channel coherence time. SNR points, namely 0, 5, 10, 15, and 20 dB for training.
Besides, the number of training samples from each SNR point
C. Generalization Ability is kept same as when trained separately out of fairness. Based
The generalization ability to different parameters heavily on our simulation results, directly using MSE as loss can lead
influences the practicality of neural networks. In the consid- to poor performance when different SNR points are trained
ered problem, there are two categories of parameters, namely together since the loss of high SNR data will be overwhelmed
system parameters and channel parameters. System parameters by the loss of low SNR data. To ensure that all SNR regimes
include the number of antennas, RF chains, and users, which get sufficient training, we use a heuristic loss computed as
determine the input and output dimensions of the network. n
Channel parameters include SNR, number of paths, angular 1 2
Weighted MSE Loss = (SNRi · x̂i − xi ), (9)
spread, and gain distribution of channel paths, which influence n i=1

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on May 23,2022 at 02:17:20 UTC from IEEE Xplore. Restrictions apply.
1832 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022

or, more precisely, its sine value. So, we select three sine
value ranges for comparison, where the first two ranges are
close to each other and the third range is far away from the
first two ranges. The average attention maps of validation
data samples whose average AoAs are inside the three ranges
are plotted in Fig. 11. The number of elements of each
attention map equals to the corresponding channel number of
the feature matrix and the values of the elements represent
the scale factors acting on the original features. Due to space
limitation, only the 16-th to the 48-th channels are displayed
here. A larger scale factor indicates more important channel of
features. From the figure, we have the following observations:
• Without HAD, the role of attention is different in different
depths of the attention-aided CNN. Specifically, as is
shown in the first two subfigures, features are scaled in
an angle-agnostic manner in shallower layers with small
differences among average attention maps of different
sine value ranges while the distributions of average atten-
tion maps become increasingly angle-specific in deeper
layers. Notice that, the mean value of the 38-th scale
factor of the third attention map varies significantly with
sine value ranges. Reasonably, it can be inferred as a key
angle-related feature in the considered problem. Such a
phenomenon is also consistent with a typical discipline in
DNNs that earlier layer features are more general while
later layer features exhibit greater specificity [47].
• The distributions of average attention maps of closer sine
Fig. 10. Generalization to SNRs with different training methods in the two value ranges are more similar. From the second subfigure,
considered scenarios. the curves of the first two ranges are very close to each
where the MSE is weighted by the SNR of data sample other, while the curve of the third range is apparently
and n is the number of samples in a mini-batch. As can different from them. It can be regarded as the embodiment
be indicated by the two close curves marked with circle of “divide-and-conquer” since the channel estimation for
and cross, networks trained with mixed SNRs achieve similar data samples in the first two ranges and the third range
performance as trained with accurate SNRs and significantly can be regarded as two different subproblems, which
outperform networks trained with a single SNR point. are “divided” by different attention maps first and then
As for the generalization to other parameters, detailed “conquered” subsequently.
• As is illustrated in the third subfigure, all scale factors
results are omitted here due to space limitation while the trends
and patterns are also similar. In conclusion, through mixed in the fourth attention map are 0.5, which is due to
parameters training and proper design of the loss function, the zero output of the former ReLU activation function
a single network with strong robustness can be obtained to and the Sigmoid activation function used to predict the
handle all situations during testing, which is very appealing in attention map. Therefore, the last attention module is
practical applications. actually useless and can be removed during testing to
further reduce the complexity [40].
• From the fourth subfigure, the differences of average
D. The Role of Attention attention maps between sine value ranges are bigger and
Although it is hard to rigorously analyze the represen- the binarization level of scale factors is higher in the
tations learned by DNNs, we still try to attain at least a HAD scenario. Only one attention module is used in
primitive understanding of the role of attention. Intuitively, the attention-aided FNN, so the “divide” process has
the performance gain of attention can be considered to come to be realized more intensely, which is different from
from the “divide-and-conquer” policy realized by the dynamic the attention-aided CNN used in the scenario without
adjustment of attention maps. In this way, sample-specific HAD. Another reason might be that compared with the
processing can be performed on different data samples to denoising process in the former scenario, reversing the
improve the performance. Without attention, the processing effect of W is more angle-related, therefore the “divide-
performed by the network is fixed for all data samples, which and-conquer” policy is reflected more fully. When dealing
is less advanced. Next, we will analyze the distributions of with a certain subproblem, only specific features are kept
learned attention maps to roughly corroborate this. and others are totally abandoned.
Due to the narrow angular spread characteristic, the channel Apart from the statistical characteristics, Fig. 12 also
distribution is highly related to the average AoA parameter, presents the attention maps of two exemplary data samples

Fig. 12. The attention maps of two exemplary data samples with very close
average AoAs.

TABLE II
C OMPLEXITY C OMPARISON W ITHOUT HAD

can reflect most of the channel’s characteristics, there are still

some features, such as the specific AoAs and gains of channel
paths, which can also be exploited by attention for further
performance improvement.

E. Complexity Comparison
Under typical system settings where N = 128, M = 32,
Lp = K = 10, and IE = 50, the specific complexity of differ-
ent algorithms is compared in Table II and Table III. Notice
that the last attention layer in attention-aided CNN is removed
during testing as mentioned above. Besides, for MMSE 3◦ ,
CCMs computed by channel samples whose average AoAs
have same sine values can be shared to halve the number of
Fig. 11. Average attention maps of data samples in three ranges. The legend parameters.
(a, b) denotes the range where the minimum and maximum sine values of As we can see, without HAD, the number of parameters
average AoAs are a and b, respectively. only increases 19.86% with the use of attention, and the
additional FLOPs overhead introduced by attention is almost
with close average AoAs. Although the average AoAs are negligible. Although the FLOPs of attention-aided CNN are
almost same, the attention maps of these two data samples are slightly higher than MMSE currently, it will be much smaller
still dramatically different, which reveals the sample-specific than MMSE if the antenna number keeps growing. Besides,
nature of attention. The reason is that although average AoA the parameter number of MMSE 3◦ is also quite large since

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on May 23,2022 at 02:17:20 UTC from IEEE Xplore. Restrictions apply.
1834 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO. 3, MARCH 2022

TABLE III [11] X. Xia, K. Xu, S. Zhao, and Y. Wang, “Learning the time-varying
C OMPLEXITY C OMPARISON W ITH HAD massive MIMO channels: Robust estimation and data-aided predic-
tion,” IEEE Trans. Veh. Technol., vol. 69, no. 8, pp. 8080–8096,
Aug. 2020.
[12] Z. Qin, H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep learning in
physical layer communications,” IEEE Wireless Commun., vol. 26, no. 2,
pp. 93–99, Apr. 2019.
[13] H. Ye, L. Liang, G. Y. Li, and B.-H. F. Juang, “Deep learning-based
end-to-end wireless communication systems with conditional GANs as
unknown channels,” IEEE Trans. Wireless Commun., vol. 19, no. 5,
tens of CCMs are required to exploit the narrow angular spread pp. 3133–3143, May 2020.
[14] J. Gao, X. Yi, C. Zhong, X. Chen, and Z. Zhang, “Deep learning
characteristic of channels. for spectrum sensing,” IEEE Wireless Commun. Lett., vol. 8, no. 6,
In the scenario with HAD, we only compare three algo- pp. 1727–1730, Dec. 2019.
rithms with practical performance. Both attention-aided CNN [15] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos,
“Learning to optimize: Training deep neural networks for wireless
and FNN have similar parameter numbers while the FLOPs resource management,” IEEE Trans. Signal Process., vol. 66, no. 20,
of attention-aided FNN is much lower. Remember that, its pp. 5438–5453, Oct. 2018.
performance is also better than attention-aided CNN, which [16] H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep reinforcement learning
based resource allocation for V2V communications,” IEEE Trans. Veh.
indicates its superiority. The FLOPs of S-VBI is significantly Technol., vol. 68, no. 4, pp. 3163–3173, Apr. 2019.
higher than the DL-based methods. In simulation, when both [17] L. Liang, H. Ye, and G. Y. Li, “Spectrum sharing in vehicular networks
run on CPU, attention-aided FNN can be hundreds of times based on multi-agent reinforcement learning,” IEEE J. Sel. Areas Com-
mun., vol. 37, no. 10, pp. 2282–2292, Oct. 2019.
faster than S-VBI in terms of clock time and the advantage
[18] L. Liang, H. Ye, G. Yu, and G. Y. Li, “Deep-learning-based wireless
will be more exaggerated if accelerated by GPU. resource allocation with application to vehicular networks,” Proc. IEEE,
vol. 108, no. 2, pp. 341–356, Feb. 2020.
[19] J. Gao, C. Zhong, X. Chen, H. Lin, and Z. Zhang, “Unsupervised
VI. C ONCLUSION learning for passive beamforming,” IEEE Commun. Lett., vol. 24, no. 5,
pp. 1052–1056, May 2020.
In this paper, we have proposed a novel attention-aided [20] H. Song, M. Zhang, J. Gao, and C. Zhong, “Unsupervised learning-based
DL framework for massive MIMO channel estimation. Both joint active and passive beamforming design for reconfigurable intelli-
the scenarios without and with HAD are considered and gent surfaces aided wireless networks,” IEEE Commun. Lett., vol. 25,
no. 3, pp. 892–896, Mar. 2021, doi: 10.1109/LCOMM.2020.3041510.
scenario-specific neural networks are customized correspond- [21] H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for channel
ingly. By integrating the attention mechanism into CNN and estimation and signal detection in OFDM systems,” IEEE Wireless
FNN, the narrow angular spread characteristic of channel can Commun. Lett., vol. 7, no. 1, pp. 114–117, Feb. 2018.
be effectively exploited, which is realized by the “divide-and- [22] P. Jiang et al., “Artificial intelligence-aided OFDM receiver: Design and
experimental results,” Dec. 2018, arXiv:1812.06638. [Online]. Available:
conquer” policy to dynamically adjust attention maps. The http://arxiv.org/abs/1812.06638
proposed approach can significantly improve the performance [23] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Model-driven deep learning for
but is with relatively low complexity. MIMO detection,” IEEE Trans. Signal Process., vol. 68, pp. 1702–1715,
Feb. 2020.
[24] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based
R EFERENCES channel estimation for beamspace mmWave massive MIMO sys-
tems,” IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 852–855,
[1] F. Rusek et al., “Scaling up MIMO: Opportunities and challenges with Oct. 2018.
very large arrays,” IEEE Signal Process. Mag., vol. 30, no. 1, pp. 40–60, [25] W. Chen, B. Zhang, S. Jin, B. Ai, and Z. Zhong, “Solving sparse linear
Jan. 2013. inverse problems in communication systems: A deep learning approach
[2] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive with adaptive depth,” IEEE J. Sel. Areas Commun., vol. 39, no. 1,
MIMO for next generation wireless systems,” IEEE Commun. Mag., pp. 4–17, Jan. 2021.
vol. 52, no. 2, pp. 186–195, Feb. 2014. [26] P. Dong, H. Zhang, G. Y. Li, I. Gaspar, and N. NaderiAlizadeh, “Deep
[3] Y. S. Cho, J. Kim, W. Y. Yang, and C.-G. Kang, MIMO-OFDM Wireless CNN-based channel estimation for mmWave massive MIMO systems,”
Communications With MATLAB. Singapore: Wiley, 2010. IEEE J. Sel. Topics Signal Process., vol. 13, no. 5, pp. 989–1000,
[4] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming design Sep. 2019.
for large-scale antenna arrays,” IEEE J. Sel. Topics Signal Process., [27] P. Wu and J. Cheng, “Deep unfolding basis pursuit: Improving sparse
vol. 10, no. 3, pp. 501–513, Apr. 2016. channel reconstruction via data-driven measurement matrices,” Jul. 2020,
[5] A. F. Molisch et al., “Hybrid beamforming for massive MIMO: A sur- arXiv:2007.05177. [Online]. Available: http://arxiv.org/abs/2007.05177
vey,” IEEE Commun. Mag., vol. 55, no. 9, pp. 134–141, Sep. 2017. [28] X. Ma and Z. Gao, “Data-driven deep learning to design pilot and
[6] S. Guo, H. Zhang, P. Zhang, P. Zhao, L. Wang, and M.-S. Alouini, channel estimator for massive MIMO,” IEEE Trans. Veh. Technol.,
“Generalized beamspace modulation using multiplexing: A breakthrough vol. 69, no. 5, pp. 5677–5682, May 2020.
in mmWave MIMO,” IEEE J. Sel. Areas Commun., vol. 37, no. 9, [29] Y. Yang, S. Zhang, F. Gao, J. Ma, and O. A. Dobre, “Graph
pp. 2014–2028, Sep. 2019. neural network-based channel tracking for massive MIMO net-
[7] D. Fan et al., “Angle domain channel estimation in hybrid millimeter works,” IEEE Commun. Lett., vol. 24, no. 8, pp. 1747–1751,
wave massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 17, Aug. 2020.
no. 12, pp. 8165–8179, Dec. 2018. [30] Y. Yang, F. Gao, C. Xing, J. An, and A. Alkhateeb, “Deep mul-
[8] J. Lee, G.-T. Gil, and Y. H. Lee, “Channel estimation via orthogonal timodal learning: Merging sensory data for massive MIMO chan-
matching pursuit for hybrid MIMO systems in millimeter wave com- nel prediction,” Jul. 2020, arXiv:2007.09366. [Online]. Available:
munications,” IEEE Trans. Commun., vol. 64, no. 6, pp. 2370–2386, http://arxiv.org/abs/2007.09366
Jun. 2016. [31] M. Wenyan, Q. Chenhao, Z. Zhang, and J. Cheng, “Sparse channel
[9] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. estimation and hybrid precoding using deep learning for millimeter wave
Signal Process., vol. 56, no. 6, pp. 2346–2356, Jun. 2008. massive MIMO,” IEEE Trans. Commun., vol. 68, no. 5, pp. 2838–2849,
[10] Y. Wang, A. Liu, X. Xia, and K. Xu, “Learning the structured sparsity: Feb. 2020.
3-D massive MIMO channel estimation and adaptive spatial interpo- [32] Y. Yang, F. Gao, G. Y. Li, and M. Jian, “Deep learning-based downlink
lation,” IEEE Trans. Veh. Technol., vol. 68, no. 11, pp. 10663–10678, channel prediction for FDD massive MIMO system,” IEEE Commun.
Nov. 2019. Lett., vol. 23, no. 11, pp. 1994–1998, Nov. 2019.

[33] A. Al-Hourani, S. Kandeepan, and A. Jamalipour, “Modeling air-to- Caijun Zhong (Senior Member, IEEE) received
ground path loss for low altitude platforms in urban environments,” the B.S. degree in information engineering from
in Proc. IEEE Global Commun. Conf., Austin, TX, USA, Dec. 2014, Xi’an Jiaotong University, Xi’an, China, in 2004,
pp. 2898–2904. and the M.S. degree in information security and the
[34] Q. Cai, C. Dong, and K. Niu, “Attention model for massive MIMO CSI Ph.D. degree in telecommunications from University
compression feedback and recovery,” in Proc. IEEE Wireless Commun. College London, London, U.K., in 2006 and 2010,
Netw. Conf. (WCNC), Marrakesh, Morocco, Apr. 2019, pp. 1–5. respectively.
[35] D. J. Ji and D.-H. Cho, “ChannelAttention: Utilizing attention layers for From September 2009 to September 2011, he was
accurate massive MIMO channel feedback,” IEEE Wireless Commun. a Research Fellow at the Institute for Electron-
Lett., vol. 10, no. 5, pp. 1079–1082, May 2021. ics, Communications and Information Technologies
[36] J. Xu, B. Ai, W. Chen, A. Yang, P. Sun, and M. Rodrigues, (ECIT), Queen’s University Belfast, Belfast, U.K.
“Wireless image transmission using deep source channel coding with Since September 2011, he has been with Zhejiang University, Hangzhou,
attention modules,” Nov. 2020, arXiv:2012.00533. [Online]. Available: China, where he is currently a Professor. His current research interests
http://arxiv.org/abs/2012.00533 include reconfigurable intelligent surfaces assisted communications and arti-
[37] H. Xie, F. Gao, S. Zhang, and S. Jin, “A unified transmission strategy for ficial intelligence-based wireless communications. He was a recipient of
TDD/FDD massive MIMO systems with spatial basis expansion model,” the 2013 IEEE ComSoc Asia-Pacific Outstanding Young Researcher Award.
IEEE Trans. Veh. Technol., vol. 66, no. 4, pp. 3170–3184, Apr. 2017. He and his coauthors has been awarded the Best Paper Award at the IEEE
[38] Y. Yang, F. Gao, X. Ma, and S. Zhang, “Deep learning-based channel GLOBECOM 2020 and IEEE ICC 2019. He was an Editor of the IEEE
estimation for doubly selective fading channels,” IEEE Access, vol. 7, T RANSACTIONS ON W IRELESS C OMMUNICATIONS and IEEE C OMMUNI -
pp. 36579–36589, Mar. 2019. CATIONS L ETTERS . He is an Editor of Science China Information Sciences
[39] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, and China Communications.
MA, USA: MIT Press, 2016.
[40] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-
excitation networks,” Sep. 2017, arXiv:1709.01507. [Online]. Available:
http://arxiv.org/abs/1709.01507 Geoffrey Ye Li (Fellow, IEEE) has been a Chair
[41] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural Professor at Imperial College London, U.K., since
networks,” Nov. 2017, arXiv:1711.07971. [Online]. Available: 2020. Before moving to Imperial, he was a Pro-
http://arxiv.org/abs/1711.07971 fessor with Georgia Institute of Technology, GA,
[42] J. Fu et al., “Dual attention network for scene segmentation,” Sep. 2018, USA, for 20 years, and a Principal Technical Staff
arXiv:1809.02983. [Online]. Available: http://arxiv.org/abs/1809.02983 Member with AT&T Labs Research, NJ, USA, for
[43] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep five years. His general research interests include
network training by reducing internal covariate shift,” in Proc. ICML, statistical signal processing and machine learning for
Lille, France, Jul. 2015, pp. 448–456. wireless communications. In the related areas, he has
[44] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep published over 600 journals and conference papers
feedforward neural networks,” in Proc. AISTATS, vol. 9, May 2010, in addition to over 40 granted patents and several
pp. 249–256. books. His publications have been cited over 48,000 times and he has been
[45] D. P. Kingma and J. Ba, “Adam: A method for stochas- recognized as a Highly Cited Researcher, by Thomson Reuters, almost every
tic optimization,” Dec. 2014, arXiv:1412.6980. [Online]. Available: year.
http://arxiv.org/abs/1412.6980 He was awarded an IEEE Fellow for his contributions to signal processing
[46] R. Caruana, S. Lawrence, and L. Giles, “Overfitting in neural nets: for wireless communications in 2005. He won several prestigious awards
Backpropagation, conjugate gradient, and early stopping,” in Proc. NIPS, from IEEE Signal Processing Society (Donald G. Fink Overview Paper
Denver, CO, USA, Dec. 2020, pp. 1–7. Award in 2017), IEEE Vehicular Technology Society (James Evans Avant
[47] A. S. Morcos, D. G. Barrett, N. C. Rabinowitz, and M. Botvinick, “On Garde Award in 2013 and Jack Neubauer Memorial Award in 2014), and
the importance of single directions for generalization,” in Proc. ICLR, IEEE Communications Society (Stephen O. Rice Prize Paper Award in 2013,
Vancouver, BC, Canada, Apr./May 2018, pp. 1–5. the Award for Advances in Communication in 2017, and Edwin Howard
Armstrong Achievement Award in 2019). He received the 2015 Distinguished
ECE Faculty Achievement Award from Georgia Tech. He has organized
and chaired many international conferences, including technical the Pro-
gram Vice-Chair of the IEEE ICC’03 and the General Co-Chair of the IEEE
GlobalSIP’14, the IEEE VTC’19 (Fall), and the IEEE SPAWC’20. He has
Jiabao Gao (Student Member, IEEE) received been involved in editorial activities for over 20 technical journals, including
the B.S. degree in information engineering from the Founding Editor-in-Chief of IEEE J OURNAL ON S ELECTED A REAS IN
Zhejiang University, Hangzhou, China, in 2019, C OMMUNICATIONS (JSAC) Special Series on ML in Communications and
where he is currently pursuing the Ph.D. degree Networking.
with Zhejiang Provincial Key Laboratory of Infor-
mation Processing, Communication and Networking.
In October 2021, he will become a Visiting Student
with the Department of Electrical and Electronic Zhaoyang Zhang (Senior Member, IEEE) received
Engineering, Imperial College London, England. His the Ph.D. degree from Zhejiang University,
current research interests include massive MIMO, Hangzhou, China, in 1998. He is currently a Qiushi
channel estimation, and machine learning for Distinguished Professor with Zhejiang University.
wireless communications. He has published more than 300 peer-reviewed
international journals and conference papers,
including six conference best papers. His current
research interests are mainly focused on the
fundamental aspects of wireless communications
Mu Hu (Graduate Student Member, IEEE) received and networking, such as information theory and
the B.E. degree in information engineering from coding, network signal processing and distributed
Zhejiang University, Hangzhou, China, in 2019, learning, AI-empowered communications and networking, network
where he is currently pursuing the M.Sc. degree with intelligence with synergetic sensing, and computation and communication.
the Information Science and Electronic Engineering He was awarded the National Natural Science Fund for Outstanding
College. His current research interests are depth Young Scholars by NSFC in 2017. He is serving as an Editor for IEEE
completion in computer vision and model acceler- T RANSACTIONS ON W IRELESS C OMMUNICATIONS, IEEE T RANSACTIONS
ation in deep learning. ON C OMMUNICATIONS, and IET Communications. He has served as the
General Chair and the TPC Co-Chair or the Symposium Co-Chair for WCSP
2013/2018, GLOBECOM 2014 Wireless Communications Symposium, and
VTC-Spring 2017 Workshop HMWC.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on May 23,2022 at 02:17:20 UTC from IEEE Xplore. Restrictions apply.

Complete Download PMP Exam Prep 2023 11th Edition Rita Mulcahy PDF All Chapters
50% (2)
Complete Download PMP Exam Prep 2023 11th Edition Rita Mulcahy PDF All Chapters
40 pages
Maintenance Instructions: BG 190TA-4
100% (2)
Maintenance Instructions: BG 190TA-4
68 pages
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
Think Level 1 Skills Test Units 9-10
50% (2)
Think Level 1 Skills Test Units 9-10
3 pages
Deep Learning-Based Channel Estimation For Wideband Hybrid Mmwave Massive Mimo
No ratings yet
Deep Learning-Based Channel Estimation For Wideband Hybrid Mmwave Massive Mimo
30 pages
Sensors 21 04861 v2
No ratings yet
Sensors 21 04861 v2
23 pages
Data-Driven Deep Learning To Design Pilot and Channel Estimator For Massive MIMO
No ratings yet
Data-Driven Deep Learning To Design Pilot and Channel Estimator For Massive MIMO
6 pages
Bayesian Channel Estimation Algorithms For Massive MIMO Systems With Hybrid Analog-Digital Processing and Low-Resolution Adcs
No ratings yet
Bayesian Channel Estimation Algorithms For Massive MIMO Systems With Hybrid Analog-Digital Processing and Low-Resolution Adcs
15 pages
Deep Learning For Fast Channel Estimation in Millimeter-Wave MIMO Systems
No ratings yet
Deep Learning For Fast Channel Estimation in Millimeter-Wave MIMO Systems
8 pages
Deep Learning at The Edge For Channel Estimation in Beyond-5G Massive MIMO
No ratings yet
Deep Learning at The Edge For Channel Estimation in Beyond-5G Massive MIMO
7 pages
Towards Deep Learning-Aided Wireless Channel Estimation and Channel State Information Feedback for 6G
No ratings yet
Towards Deep Learning-Aided Wireless Channel Estimation and Channel State Information Feedback for 6G
15 pages
Deep Learning-Based Channel Estimation With Application To 5G and Beyond Networks
No ratings yet
Deep Learning-Based Channel Estimation With Application To 5G and Beyond Networks
7 pages
Deep learning
No ratings yet
Deep learning
38 pages
Fine Its About 5G and OFDM
No ratings yet
Fine Its About 5G and OFDM
7 pages
Channel Estimation and Hybrid Precoding For Millimeter Wave Communications A Deep Learning-Based Approach
No ratings yet
Channel Estimation and Hybrid Precoding For Millimeter Wave Communications A Deep Learning-Based Approach
16 pages
Deep Reinforcement Learning Based End-to-End Multiuser Channel Prediction and Beamforming
No ratings yet
Deep Reinforcement Learning Based End-to-End Multiuser Channel Prediction and Beamforming
15 pages
Power of Deep Learning For Channel Estimation and Signal Detection in OFDM Systems
No ratings yet
Power of Deep Learning For Channel Estimation and Signal Detection in OFDM Systems
4 pages
Wideband Channel Estimation With a Generative Adversarial Network
No ratings yet
Wideband Channel Estimation With a Generative Adversarial Network
12 pages
High Dimensional Channel Estimation Using Deep Generative Networks
No ratings yet
High Dimensional Channel Estimation Using Deep Generative Networks
30 pages
364 Power of Deep Learning For
No ratings yet
364 Power of Deep Learning For
4 pages
Channel Reconstruction Through Improvised Deep Learning Architecture For High-Speed Networks
No ratings yet
Channel Reconstruction Through Improvised Deep Learning Architecture For High-Speed Networks
13 pages
Deep Learning Based Channel Estimation With Flexible Delay and Doppler Networks For 5G NR
No ratings yet
Deep Learning Based Channel Estimation With Flexible Delay and Doppler Networks For 5G NR
6 pages
A Survey On Deep-Learning Based Techniques For Modeling and Estimation of MassiveMIMO Channels
No ratings yet
A Survey On Deep-Learning Based Techniques For Modeling and Estimation of MassiveMIMO Channels
12 pages
Deep_Learning-Based_Channel_Estimation
No ratings yet
Deep_Learning-Based_Channel_Estimation
4 pages
Final Thesis 194524
No ratings yet
Final Thesis 194524
55 pages
Deep Learning-Based Channel Estimation
No ratings yet
Deep Learning-Based Channel Estimation
4 pages
Peizhe Aug2020$1$
No ratings yet
Peizhe Aug2020$1$
6 pages
Deep Learning-Based Channel Estimation For Beamspace Mmwave Massive MIMO Systems
No ratings yet
Deep Learning-Based Channel Estimation For Beamspace Mmwave Massive MIMO Systems
4 pages
remotesensing-16-00247
No ratings yet
remotesensing-16-00247
22 pages
1 Machine Learning Channel
No ratings yet
1 Machine Learning Channel
6 pages
Sensors 22 03938
No ratings yet
Sensors 22 03938
16 pages
2018 deep learning for super resulution DOA
No ratings yet
2018 deep learning for super resulution DOA
5 pages
Sparse Group-Sparse and Online Bayesian Learning Aided Channel Estimation For Doubly-Selective MmWave Hybrid MIMO OFDM Systems
No ratings yet
Sparse Group-Sparse and Online Bayesian Learning Aided Channel Estimation For Doubly-Selective MmWave Hybrid MIMO OFDM Systems
16 pages
Adaptive Neural Signal Detection
No ratings yet
Adaptive Neural Signal Detection
14 pages
BAse-1
No ratings yet
BAse-1
11 pages
Deep Learning Channel Estimation
No ratings yet
Deep Learning Channel Estimation
5 pages
Deep Learning For PHY Layer 5G Challenges
No ratings yet
Deep Learning For PHY Layer 5G Challenges
18 pages
Efficient MIMO Detection With Imperfect Channel Knowledge - A Deep Learning Approach
No ratings yet
Efficient MIMO Detection With Imperfect Channel Knowledge - A Deep Learning Approach
6 pages
He 2020
No ratings yet
He 2020
14 pages
Deep Learning-Based Channel Estimation For Doubly
No ratings yet
Deep Learning-Based Channel Estimation For Doubly
11 pages
A_Family_of_Deep_Learning_Architectures_for_Channel_Estimation_and_Hybrid_Beamforming_in_Multi-Carrier_mm-Wave_Massive_MIMO
No ratings yet
A_Family_of_Deep_Learning_Architectures_for_Channel_Estimation_and_Hybrid_Beamforming_in_Multi-Carrier_mm-Wave_Massive_MIMO
15 pages
Massive MIMO Channel Estimation1
No ratings yet
Massive MIMO Channel Estimation1
30 pages
Massive MIMO CSI Feedback Using Channel Prediction: How To Avoid Machine Learning at UE?
No ratings yet
Massive MIMO CSI Feedback Using Channel Prediction: How To Avoid Machine Learning at UE?
14 pages
2302.01035v1
No ratings yet
2302.01035v1
6 pages
Beyond-5G Massive MIMO Equipped With Co-Prime Arrays Using Deep Learning
No ratings yet
Beyond-5G Massive MIMO Equipped With Co-Prime Arrays Using Deep Learning
11 pages
Deep Learning For Massive MIMO With 1 Bit ADCs When More Antennas
No ratings yet
Deep Learning For Massive MIMO With 1 Bit ADCs When More Antennas
5 pages
10 1109@ojcoms 2020 3015394
No ratings yet
10 1109@ojcoms 2020 3015394
27 pages
Deep Learning Based Antenna-Time Domain Channel Extrapolation For Hybrid Mmwave Massive MIMO
No ratings yet
Deep Learning Based Antenna-Time Domain Channel Extrapolation For Hybrid Mmwave Massive MIMO
5 pages
Designing Learning-Based Adversarial Attacks To MIMO-OfDM Systems With Adaptive Modulation
No ratings yet
Designing Learning-Based Adversarial Attacks To MIMO-OfDM Systems With Adaptive Modulation
11 pages
Two-Step Machine Learning Approach For
No ratings yet
Two-Step Machine Learning Approach For
13 pages
Channel Estimation and Symbol Detection in Massive MIMO Systems U
No ratings yet
Channel Estimation and Symbol Detection in Massive MIMO Systems U
122 pages
Enhanced_AI-Based_CSI_Prediction_Solutions_for_Massive_MIMO_in_5G_and_6G_Systems
No ratings yet
Enhanced_AI-Based_CSI_Prediction_Solutions_for_Massive_MIMO_in_5G_and_6G_Systems
16 pages
Model-Driven Deep Learning-Based MIMO-OFDM Detector Design Simulation and Experimental Results
No ratings yet
Model-Driven Deep Learning-Based MIMO-OFDM Detector Design Simulation and Experimental Results
15 pages
Massive_MIMO_Channel_Estimation_With_Convolutional_Neural_Network_Structures
No ratings yet
Massive_MIMO_Channel_Estimation_With_Convolutional_Neural_Network_Structures
16 pages
Takedaa Paper
No ratings yet
Takedaa Paper
6 pages
Deep Reinforcement Learning for Multi-user
No ratings yet
Deep Reinforcement Learning for Multi-user
33 pages
Cascaded Channel Estimation For Large Intelligent Metasurface Assisted Massive MIMO
No ratings yet
Cascaded Channel Estimation For Large Intelligent Metasurface Assisted Massive MIMO
5 pages
Estimation of Time-Varying Channels in Virtual Angular Domain For Massive MIMO Systems
No ratings yet
Estimation of Time-Varying Channels in Virtual Angular Domain For Massive MIMO Systems
11 pages
1388254 Channel Parameter Estimation of MmWave MIMO System in Urban Traffic Scene a Training Channel-Based Method
No ratings yet
1388254 Channel Parameter Estimation of MmWave MIMO System in Urban Traffic Scene a Training Channel-Based Method
9 pages
PHD Thesis Sandesh
No ratings yet
PHD Thesis Sandesh
213 pages
Channel Estimation Techniques Over MIMO-OFDM System
No ratings yet
Channel Estimation Techniques Over MIMO-OFDM System
4 pages
Deep Channel Learning For Large Intelligent Surfaces Aided Mm-Wave Massive MIMO Systems
No ratings yet
Deep Channel Learning For Large Intelligent Surfaces Aided Mm-Wave Massive MIMO Systems
5 pages
Handbook of Ultra-Wideband Short-Range Sensing: Theory, Sensors, Applications
From Everand
Handbook of Ultra-Wideband Short-Range Sensing: Theory, Sensors, Applications
Jürgen Sachs
No ratings yet
Recombinant DNA Technology - Tools, Process, and Applications
No ratings yet
Recombinant DNA Technology - Tools, Process, and Applications
2 pages
MGT2320_Assignment 2_ WRubric_W19 (5)
No ratings yet
MGT2320_Assignment 2_ WRubric_W19 (5)
4 pages
B To B Personal Selling
No ratings yet
B To B Personal Selling
61 pages
Lab 6 Details of N Students. / C++ Program To Demonstrate Example of Array of Objects
No ratings yet
Lab 6 Details of N Students. / C++ Program To Demonstrate Example of Array of Objects
6 pages
Vanilla Sponge Cake, Step-By-Step - The Scranline
No ratings yet
Vanilla Sponge Cake, Step-By-Step - The Scranline
5 pages
Pensky Martien Manual
No ratings yet
Pensky Martien Manual
3 pages
Kags Fireworks Price List
No ratings yet
Kags Fireworks Price List
4 pages
Electrifying Fuck With Electrician
33% (6)
Electrifying Fuck With Electrician
3 pages
Pan-Cancer T Cell Atlas Links A Cellular Stress Response State To Immunotherapy Resistance
No ratings yet
Pan-Cancer T Cell Atlas Links A Cellular Stress Response State To Immunotherapy Resistance
34 pages
A Principled Approach To Teaching Music Composition To Childrens
No ratings yet
A Principled Approach To Teaching Music Composition To Childrens
13 pages
Beginnings in The Celtic World & The Mythological Cycle
No ratings yet
Beginnings in The Celtic World & The Mythological Cycle
32 pages
Man and Superman (Shaw, George Bernard)
No ratings yet
Man and Superman (Shaw, George Bernard)
221 pages
Unit 01 Big Data
No ratings yet
Unit 01 Big Data
7 pages
Problem Solving Questions
No ratings yet
Problem Solving Questions
2 pages
Kurukshetra University Student-Panel
No ratings yet
Kurukshetra University Student-Panel
1 page
Evaluating Gutter and Downspout Capacity 1: Total Total
No ratings yet
Evaluating Gutter and Downspout Capacity 1: Total Total
3 pages
Actividades-Explicaciones y Ejercicios PDF
No ratings yet
Actividades-Explicaciones y Ejercicios PDF
25 pages
Edif - ERA - DataSheets - ASME API PDF
No ratings yet
Edif - ERA - DataSheets - ASME API PDF
2 pages
Amidst Endless Quiet
No ratings yet
Amidst Endless Quiet
2 pages
Autism: Brittney Ortega San Diego Christian College Ed 300
No ratings yet
Autism: Brittney Ortega San Diego Christian College Ed 300
17 pages
Supplier Registration Form 2024 2025
No ratings yet
Supplier Registration Form 2024 2025
14 pages
david_sm13_ppt_04.ppt
No ratings yet
david_sm13_ppt_04.ppt
29 pages
DFVFVF
No ratings yet
DFVFVF
11 pages
Labour NR - 5
No ratings yet
Labour NR - 5
5 pages
WWW Library Miami Edu
No ratings yet
WWW Library Miami Edu
3 pages
Neuropsychology of Anxiety Disorders
No ratings yet
Neuropsychology of Anxiety Disorders
29 pages
Master Mtr201607dan
0% (1)
Master Mtr201607dan
261 pages

An Attention-Aided Deep Learning Framework For Massive MIMO Channel Estimation

Uploaded by

An Attention-Aided Deep Learning Framework For Massive MIMO Channel Estimation

Uploaded by

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 21, NO.

3, MARCH 2022 1823

An Attention-Aided Deep Learning Framework

M ASSIVE multiple-input multiple-output (MIMO) is a

Fig. 1. Massive MIMO system without HAD.

Fig. 3. Structure of the attention module.

2 to balance performance and complexity and the FC layers

A. Problem Reformulation With HAD

rather than CNN to achieve better performance. Although the TABLE I

Fig. 5. Channel estimation network structure with HAD.

Fig. 7. Impact of SNR in the two considered scenarios.

compared with Fig. 7(a), the performance gain of the attention

the input and output distributions of the network. For system

can reflect most of the channel’s characteristics, there are still

You might also like