0% found this document useful (0 votes)
2 views

Deep Learning-based CSI Feedback in MIMO

This document provides an overview of deep learning-based channel state information (CSI) feedback in massive MIMO systems, emphasizing its importance for performance gains in wireless communication. It discusses the challenges of traditional CSI feedback methods and highlights the potential of deep learning techniques to reduce feedback overhead and improve accuracy. The paper categorizes existing research, outlines practical considerations, and identifies future research directions in the field.

Uploaded by

ceylanferdi01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Deep Learning-based CSI Feedback in MIMO

This document provides an overview of deep learning-based channel state information (CSI) feedback in massive MIMO systems, emphasizing its importance for performance gains in wireless communication. It discusses the challenges of traditional CSI feedback methods and highlights the potential of deep learning techniques to reduce feedback overhead and improve accuracy. The paper categorizes existing research, outlines practical considerations, and identifies future research directions in the field.

Uploaded by

ceylanferdi01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

1

Overview of Deep Learning-based CSI Feedback in


Massive MIMO Systems
Jiajia Guo, Graduate Student Member, IEEE, Chao-Kai Wen, Senior Member, IEEE,
Shi Jin, Senior Member, IEEE, and Geoffrey Ye Li, Fellow, IEEE

Abstract—Many performance gains achieved by massive Latent space


multiple-input and multiple-output depend on the accuracy of representation
the downlink channel state information (CSI) at the transmitter
(base station), which is usually obtained by estimating at the
arXiv:2206.14383v1 [eess.SP] 29 Jun 2022

receiver (user terminal) and feeding back to the transmitter. The


overhead of CSI feedback occupies substantial uplink bandwidth Encoder Decoder
resources, especially when the number of the transmit antennas
is large. Deep learning (DL)-based CSI feedback refers to CSI Original
compression and reconstruction by a DL-based autoencoder Original Reconstructed CSI
and can greatly reduce feedback overhead. In this paper, a image image “image”
comprehensive overview of state-of-the-art research on this topic
(a) Autoencoder for image compression
is provided, beginning with basic DL concepts widely used
in CSI feedback and then categorizing and describing some
LatentThe
existing DL-based feedback works. spacefocus is on novel neural Feedback
representation
network architectures and utilization of communication expert codeword
knowledge to improve CSI feedback accuracy. Works on bit-
level CSI feedback and joint design of CSI feedback with other
communication modules are also introduced, and some practical
Encoder Decoder
issues, including training dataset collection, online training, com-
Encoder Decoder
plexity, generalization, and standardization effect, are discussed.
Original
At the Original
end of the paper, some challenges and potentialReconstructed
research Reconstructed
CSI
directions associated with DL-based CSI feedback in image
image future Feedback CSI “image”
wireless communication systems are identified. “image”
codeword
Index Terms—CSI feedback, massive MIMO, deep learning, (b) Autoencoder for CSI feedback
overview. Fig. 1. Illustration of autoencoder architectures. In image compression, the
NN-based encoder compresses the original image into a low-dimensional
representation and then the NN-based decoder reconstructs the image from
I. I NTRODUCTION Original
the latent Reconstructed
representation. The encoder and decoder are jointly trained. In the
sub-figure, the downlink CSI is regarded as a special type CSI
rightCSI“image of “image”.
HE 3rd Generation Partnership Project (3GPP) completed
T the first release of the fifth generation (5G) mobile
communications, namely, Release 15, in 2018, laying the
” “image”

on embracing artificial intelligence (AI)


Encoder
and machine learning
Decoder
foundation for the global commercial deployment of 5G [1].
(ML) technologies. Release 18 is expected to pave the way
The three major usage scenarios of 5G networks include en-
Feedback toward integrating AI and communications. MIMO evolution
hanced mobile broadband (eMBB) to ultra-reliable low-latency
codeword
is one of the key features in 3GPP Release 18.
communications (URLLC) to massive machine type commu-
In the past 10 years, deep learning (DL) has achieved great
nications. To support the use cases, some novel techniques,
success in many areas. Inspired by its success, DL has been
including millimeter-wave transmission, network densification,
introduced in wireless communications [6]–[9]. The DL tech-
and massive multiple-input and multiple-output (MIMO),
Original
have
Reconstructed nology can be used to enhance the conventional communica-
been introduced
CSI“image [2]. 3GPP has been working on 5G evolution CSI
“image”
tion blocks. In [10] and [11], deep neural networks (NNs) are
in Releases”16 and 17 to enhance existing features further and
used to design a downlink beamforming (BF) matrix. In [12]
enable new use cases [3], [4]. With Releases 17 specification
and [13], DL is introduced for channel estimation and symbol
work ongoing, 3GPP also started the plan for 5G-Advanced
Encoder Decoder detection, which has been validated by an over-the-air test
and recently approved the package including 27 work items in
in [14]. Furthermore, DL enables end-to-end communication
Release 18 [5]. In particular, the features of Release 18 work
systems [15], [16], where the transmitter and the receiver are
Jiajia Guo and Shi Jin are with the National Mobile Communications represented by NNs in the form of autoencoder. The concept
Research Laboratory, Southeast University, Nanjing, 210096, P. R. China of end-to-end communication systems is validated in [17].
(email: [email protected], [email protected]).
Chao-Kai Wen is with the Institute of Communications Engineer- In massive MIMO, the base station (BS) is equipped with
ing, National Sun Yat-sen University, Kaohsiung 80424, Taiwan (e-mail: a large number (up to a few hundred) of active antennas
[email protected]). and simultaneously serves multiple users. The knowledge of
Geoffrey Ye Li is with the Department of Electrical and Electronic
Engineering, Imperial College London, London SW7 2AZ, U.K. (e-mail: accurate channel state information (CSI) at the BS is essential
[email protected]). to obtaining the performance gains of massive MIMO [18],
2

Section II: A. Massive MIMO system


Conventional B. Codebook-based CSI feedback
feedback schemes C. Compressive sensing-based CSI feedback

A. Fully connected layer


B. Convolutional layer
Section III:
C. Long short-term memory
Common deep
D. Variational autoencoder
network architectures
E. Generative adversarial networks
F. Attention mechanism

Overview of
A. Novel network architecture design
DL-based CSI feedback
B. Multiple domain correlations
Section IV: C. Bitstream generation
DL-based CSI feedback D. Joint design with other modules
E. Practical consideration
F. Other related works

A. CSI Datasets from realistic systems


B. Tradeoff between performance and complexity
Section V: C. Generalization
Future research D. Effect on standardization
directions E. High-speed scenario
F. Other emerging techniques
G. Open source dataset and code

Fig. 2. Outline of article

[19]. Downlink CSI acquisition contains two main steps. First, techniques. Fig. 1(b) shows that the CsiNet framework in
the user estimates the downlink CSI utilizing the received [23] regards downlink CSI as a special type of “image.”
pilot signals transmitted by the BS. Then, the user feeds the The encoder at the user compresses the downlink CSI. The
estimated downlink CSI back to the BS through the uplink compressed CSI is then fed back to the transmitter. Upon
control channel. In massive MIMO systems, a large number of obtaining the feedback information, the decoder at the BS
antennas at the BS result in a vast CSI dimension and require reconstructs the CSI by NNs.
a substantial feedback overhead. In addition, commercial de- This paper provides an overview of DL-based CSI feedback
ployments in 5G have observed that the user often experiences in massive MIMO systems. It focuses on feedback perfor-
considerable performance loss due to the outdated CSI fed mance, NN complexity, and the effect on other communica-
back by the user. Conventional CSI feedback methods are tion. First, the conventional CSI feedback methods, including
based on codebook [20] and compressive sensing (CS) [21], codebook-based and CS-based feedback algorithms, are briefly
which cannot meet the requirement of low complexity and high introduced, and their main limitations are discussed. Then, a
accuracy. Therefore, potential CSI feedback enhancements brief introduction to the basic concepts of the NNs and some
are explored to improve the performance of massive MIMO representative NN architectures, which are widely used in the
systems. 3GPP Release 18 will study AI/ML for this use case existing CSI feedback works, is provided1 . Next, the exist-
[22]. ing DL-based feedback works are divided into six different
categories, namely, the introduction of novel NN design, the
CSI compression and feedback in massive MIMO can be utilization of multi-domain correlations, bitstream generation,
also based on DL [23]. The common architecture adopted joint design with other modules, practical considerations, and
in DL-based feedback borrows the idea of the autoencoder others. Finally, the main challenges of DL-based CSI feedback,
used in image compression in Fig. 1. Fig. 1(a) shows that especially in 3GPP standardization, are discussed, and the
the encoder compresses the original image by the NNs to future research direction is identified.
generate latent space representation. The dimension of this The article is outlined in Fig. 2. Section II presents system
latent representation is much lower than that of the original models and some conventional feedback methods in massive
image. Then, the NN-based decoder reconstructs the original MIMO systems. Section III describes basic NN concepts
image from the received latent representation. The NN-based and representative architectures widely used in DL-based CSI
encoder and decoder in data compression are trained in an end-
to-end manner. The autoencoder-based image compression 1 This part can be skipped if the reader has a good understanding of basic
has substantially outperformed the conventional compression DL concepts.
3

feedback. Section IV overviews existing works including the Shared BS


User Codeword
motivations, key ideas, and weaknesses. Section V illustrates Index Codebook
the main challenges and the corresponding future research 1
1
4
directions. Section VI concludes this paper. Search the closest 2
Lookup the shared
Notations: In this paper, italic letters represent scalars. codeword 3 codebook
Bold-face lower-case and upper-case letters denote vectors 4
and matrices, respectively. C𝑚×𝑛 (R𝑚×𝑛 ) denotes the space of
𝑚 × 𝑛 complex-valued (real-valued) matrices. I represents an
identity matrix. The transpose, conjugate, Hermitian transpose, h n,k 2 3 h n ,k
and inverse operations are represented by (·) T , (·) ∗ , (·) H , and 2B
(·) −1 , respectively. Tr(·) and 𝐸 (·) denote the trace and the Codeward Received
Index Index
expectation of a matrix, respectively. The Euclidean norm of feedback
a vector is written as k · k. round(·) represents the rounding
operation. [A] 𝑖, 𝑗 represents the (𝑖, 𝑗)-th element of matrix A.
Fig. 3. Illustration of codebook-based CSI feedback. The codebook is known
to the user and the BS. The user searches the codeword, which is the closest
II. C ONVENTIONAL FEEDBACK SCHEMES to the downlink CSI, and feeds back the corresponding index to the BS. Upon
receiving the index, the BS can obtain the channel by looking up the shared
codebook.
In this section, some representative conventional CSI feed-
back methods are presented. After introducing the fundamental
signal model of massive MIMO systems, the basic ideas and B. Codebook-based CSI Feedback
the pros and cons of different methods are discussed. In Fig. 3, random vector quantization (RVQ) [24] is used as
an example to introduce the key idea of codebook-based feed-
back briefly. Assuming that the feedback bit number is 𝐵, the
A. Massive MIMO System CSI codebook, C, shared by the BS and the user/user equip-
For simplicity, a simple single-cell massive MIMO system ment (UE) contains 2 𝐵 𝑁t -dimensional unit norm isotropic
operated in orthogonal frequency-division multiplexing mode distributed vectors. The codeword for the downlink CSI h𝑛,𝑘
with 𝑁c subcarriers is considered. The BS is equipped with a can be obtained by
uniform linear antenna array (ULA) with 𝑁t ( 1) transmit ĥ𝑛,𝑘 = arg max khH (4)
𝑛,𝑘 uk.
antennas. 𝐾 users each have a single receive antenna. If the u∈ C
BS adopts a linear precoding algorithm, such as zero-forcing The user feeds back the index of the selected codeword u to
(ZF), the transmit signal from the BS at the 𝑛-th subcarrier the BS via the uplink control channel. The BS obtains the
will be corresponding codeword based on the received index.
𝐾
∑︁ The codebook-based CSI feedback has faced some chal-
x𝑛 = v𝑛,𝑘 𝑠 𝑛,𝑘 = V𝑛 s𝑛 , (1) lenges. Feedback accuracy is improved with codebook size 2 𝐵 .
𝑘=1
For example, the TYPE II codebook in 5G new radio (NR)
where v𝑛,𝑘 ∈ C 𝑁t ×1 and 𝑠 𝑘 denote the linear precoding vector remarkably outperforms the TYPE I codebook at the expense
for the 𝑘-th user and the transmitted symbol of the 𝑘-th user, of a substantial increase in feedback bit number. In addition,
respectively, and V𝑛 = [v𝑛,1 , . . . , v𝑛,𝐾 ]. The whole precoding codeword search complexity increases with codebook size.
matrix and the transmitted symbol should satisfy the power Although many algorithms, such as an adaptive codebook,
constraints as Tr(V𝑛 VH H have been proposed [20], the performance of feedback accu-
𝑛 ) ≤ 𝑃 and 𝐸 (s𝑛 s𝑛 ) = I, respectively.
The received signal at the 𝑘-th user over the 𝑛-th subcarrier racy, complexity, and overhead of channel codebook needs to
can be expressed as: be improved further.
∑︁
𝑦 𝑛,𝑘 = hT𝑛,𝑘 v𝑛,𝑘 𝑠 𝑛,𝑘 + hT𝑛,𝑘 v𝑛,𝑖 𝑠 𝑛,𝑖 + 𝑧 𝑛,𝑘 , (2) C. CS-based CSI Feedback
𝑖≠𝑘 The CSI matrix is sparse in certain domains, such as time,
spatial, spatial-temporal, and spatial-frequency domains [21].
where h𝑛,𝑘 ∈ C 𝑁t ×1 is the frequency response vector at the CS can be used to reduce the overhead of downlink CSI. Given
𝑛-th subcarrier, and 𝑧 𝑛,𝑘 ∼ CN (0, 𝜎 2 ) denotes the complex that the number of scatter clusters is much smaller than that
additive white Gaussian noise with zero mean and variance of the transmit antennas at the BS in massive MIMO systems,
𝜎 2 . The BS designs precoding matrix V𝑛 for the 𝑛-th sub- the CSI matrix can be represented by much fewer parameters,
carrier using the entire downlink CSI matrix of all users, and the spatial domain turns into the sparse angular domain
H𝑛 = [h𝑛,1 , . . . , h𝑛,𝐾 ]. For example, the ZF precoding matrix using discrete Fourier transform (DFT) as
can be expressed as
h̃𝑛,𝑘 = Fh𝑛,𝑘 , (5)
V𝑛 = 𝑐H∗𝑛 (HT𝑛 H∗𝑛 ) −1 , (3)
where F ∈ C 𝑁t ×𝑁t stands for a DFT matrix. CSI compression
√︁ at the user is implemented via a sensing matrix as
where 𝑐 = 𝑃/k (HT𝑛 H∗𝑛 ) −1 k 2 is the power normalization
factor. m = 𝚽h̃𝑛,𝑘 , (6)
4

where 𝚽 ∈ C 𝑀 ×𝑁t and m stand for the sensing matrix and


the compressed measurement vector, respectively. To ensure
a high-accuracy reconstruction at the BS, sensing matrix 𝚽
should satisfy the restricted isometry property.
Assuming that at most 𝑝 elements in the vector h̃𝑛,𝑘 3 3
are nonzero, the original high-dimension CSI vector can be
accurately recovered from the measurement m when 𝑁t  𝑀
and 𝑀 > 𝑝. The reconstruction problem can be formulated as
the following minimization task
min k h̃𝑛,𝑘 k 0 , s.t. m = 𝚽h̃𝑛,𝑘 . (7) 55
h̃𝑛,𝑘

This optimization can be solved by the iterative algorithms,


such as iterative shrinkage thresholding algorithm (ISTA) [25]. Input
However, some challenges hinder the deployment of CS-based
feedback: The CS-based CSI feedback is based on the sparsity Fig. 4. Receptive field illustration of two stacked 3 × 3 convolutional layers.
assumption of CSI, which may not hold in practical systems. Stride 𝑠 is set as 1. The upper “pixel” is determined by the 3×3 square area in
the middle. Each intermediate “pixel” is determined by the input 3 × 3 square
The complexity of the iterative reconstruction algorithms is area, which is overlapped with one another. Therefore, the upper “pixel” is
too high to meet the real-time requirements. determined by the 5 × 5 square area of the input.

III. C OMMON D EEP N ETWORK A RCHITECTURES


The FC layer is widely used to extract features from
In this section, common deep network architectures, includ- the input vectors in computer vision. In addition to feature
ing fully connected (FC) layers, convolutional layers, long extraction, the FC layer can be used to change the dimension
short-term memory networks (LSTMs), variational autoen- of the input vector. For example, in CsiNet [23], the last layer
coder (VAE), generative adversarial networks (GAN), and at the encoder and the first layer at the decoder are FC layers,
attention mechanism, which are widely adopted in DL-based which adjust the vector dimension by changing the neuron
CSI feedback works, are briefly introduced. number of the FC output, that is, 𝑁out .
NN complexity is evaluated by two metrics: the number of
A. FC Layer NN parameters and the number of floating point operations
(FLOPs). The parameter of the FC layer can be calculated by
The FC layer is the vanilla NN layer, in which all input
neurons are connected to all output neurons. This layer first 𝑁FC = 𝑁in × 𝑁out + 𝑁out ≈ 𝑁in × 𝑁out . (14)
multiplies the input vector by a weight matrix and then adds 2
a bias vector, which can be formulated as FLOP number can be obtained by
0 𝐹FC = 2 × 𝑁in × 𝑁out . (15)
yFC = WFC xFC + bFC , (8)
0 From (14) and (15), the complexity of FC layers increases
where xFC ∈ R 𝑁in ×1 and yFC ∈ R 𝑁out ×1 stand for the input and
with the dimensions of the input and the output vectors.
the output vectors, respectively, and WFC and bFC represent the
weight matrix and the bias vector of the FC layer, respectively.
This operation is linear and in real numbers. Following this B. Convolutional Layer
operation, the activation function is implemented as The convolutional layer is composed of a linear convolution
0 operation, which encompasses the multiplication of a set of
yFC = 𝜎(yFC ) = 𝜎(WFC xFC + bFC ), (9) weights with the input similar to sliding a filter across an
where 𝜎(·) represents the non-linear activation function. In input vector. The convolutional layer can learn features with
DL-based CSI feedback, the commonly used activation func- invariance to shifts in the input [15]. The NN parameter
tions are Tanh, Sigmoid, ReLU, and Leaky ReLU functions, number is substantially reduced in comparison with that of
as follows: the FC layers.
𝑒 𝑧 − 𝑒 −𝑧 Assuming that the convolutional layer is composed of 𝐶out
𝑇 𝑎𝑛ℎ(𝑧) = 𝑧 , (10)
𝑒 + 𝑒 −𝑧 filter weights Q𝑐 ∈ R𝑎×𝑏 for 𝑐 = 1, . . . , 𝐶out . Q𝑐 is multiplied
1 by input x ∈ R 𝑤×ℎ to generate a feature map Y𝑐 , which can
𝑆𝑖𝑔𝑚𝑜𝑖𝑑 (𝑧) = , (11) be achieved by the convolution operation as
1 + 𝑒 −𝑧
𝑎−1 ∑︁
𝑏−1
𝑅𝑒𝐿𝑈 (𝑧) = max(0, 𝑧), (12) ∑︁
( [Y𝑐 ] 𝑖, 𝑗 = [Q𝑐 ] 𝑎−𝑚,𝑏−𝑛 [x] 1+𝑠 (𝑖−1)−𝑚,1+𝑠 ( 𝑗−1)−𝑛 , (16)
𝑧 𝑧 ≥ 0, 𝑚=0 𝑛=0
𝐿𝑒𝑎𝑘 𝑦𝑅𝑒𝐿𝑈 (𝑧) = (13)
𝑎𝑧 𝑧 < 0, where 𝑠 ≥ 1 is an integer hyperparameter named as stride. If
input matrix x is padded with (𝑤 + 𝑎 − 1) (ℎ + 𝑏 − 1) − 𝑤ℎ
where 𝑎 ∈ (0, 1) is a hyperparameter, namely, negative slope
[26]. 2 In this paper, the FLOPs number of the activation function is neglected.
5

Latent space
Ct-1 x + Ct

representation

latent vector
Sampled
Encoder
ft it x~ Tanh Mean Decoder

Ct Ot 
σ σ Tanh Original Reconstructed
image image
Standard
ht 1 σ x ht deviation

Pointwise Fig. 6. Illustration of VAE architecture that encodes the input as a distribution
Xt NN layer operation Copy Concatenate over latent space instead of a single point like that in an autoencoder

that an LSTM cell contains the input, forget, and output gates,
Fig. 5. Illustration of an LSTM cell [30] that has feedback connections and namely, i𝑡 , f𝑡 , and o𝑡 , respectively, where x𝑡 , C𝑡−1 , and h𝑡−1
allows information to persist represent the input to the LSTM cell, the state of the past cell,
and the output of the past LSTM cell, respectively. The goal of
zeros, that is, [x] 𝑖, 𝑗 = 0 for all 𝑖 ∉ [1, 𝑤] and 𝑗 ∉ [1, ℎ], the input gate i𝑡 is to decide whether the input ( x𝑡 and h𝑡−1
the dimension of the feature map Y𝑐 is (1 + b 𝑤+𝑎−2 c) × (1 + ) is needed to be stored in the cell and drop the unwanted
𝑠
b 𝑙+𝑏−2 c). Assuming that the depth of the layer’s input is 𝐶in , information. The forget gate, f𝑡 , determines whether to drop
𝑠
the parameter number of this layer is the previous state information C𝑡−1 based on the input data.
The state information of this cell, C𝑡 , is updated based on
𝑁conv = (𝑎 × 𝑏 × 𝐶in + 1) × 𝐶out . (17) the forget and the input gates. The output of this cell, h𝑡 ,
is decided by the input data and the current state, C𝑡 . More
The number of FLOPs can be obtained by
details about LSTMs can be found in [30]. LSTM has many

𝐹conv = 2 × 𝐶in × (𝑎 × 𝑏) × 𝑤 × ℎ × 𝐶out . (18) different variants and topologies, such as bidirectional LSTM
(Bi-LSTM), and gated recurrent unit (GRU), referring to [32]
In convolutional NNs (CNNs), several and even hundreds of for a tutorial.
convolutional layers are stacked to extract the input features,
such as ResNet-152 [27], increasing the receptive field of D. VAE
convolutional layers. The receptive field stands for the size
Fig. 1(a) shows that an autoencoder can realize a high
of the region in the input, which produces the features. Fig.
dimension reduction with a high reconstruction accuracy.
4 shows an example with two stacked convolutional layers,
However, it cannot be used for the generative tasks because
whose filter sizes are 3×3 and strides are 1. The upper “pixel”
of the lack of structure among the latent vectors. To solve
is determined by the 3 × 3 square area in the middle. Each
this problem, a variant of the autoencoder, namely, VAE, is
“pixel” in the middle is determined by “pixels” in the down
introduced in [33]. VAE encodes the input as a distribution
3 × 3 square area. The 3 × 3 filter slides across the whole
over the latent space instead of a single point like that in an
matrix. Therefore, the upper “pixel” value is determined by
autoencoder. The output of the encoder is the mean vector 𝝁
the “pixels” in the 5 × 5 input area, and the receptive field of
and standard deviation vector 𝜹. A point from latent space is
this NN block is 5 × 5.
sampled from a predefined distribution as
The size of the receptive field is essential to the NN perfor-
mance. As pointed out in [28], a logarithmic relationship exists z = 𝝁 + 𝜹𝝐, (19)
between receptive field size and the accuracy of classification
where 𝝐 ∼ N (0, I). Then, latent representation vector z is sent
tasks. RepLKNet with 31 × 31 convolutional kernels [29]
to the decoder to reconstruct the original data.
outperforms the most existing NNs in computer vision tasks. During the training of the autoencoder, mean-squared error
In DL-based feedback, many existing works focus on reducing (MSE) between the input and the output of the autoencoder is
feedback errors by adjusting the NN receptive field. widely used as the loss function. However, the training of VAE
has two main objectives: reconstructing the input and ensuring
C. LSTMs the latent vector z to be normally distributed. Therefore, its
The traditional NNs, such as FC and convolutional layers, loss function is the sum of the reconstruction loss and the
make decisions only based on the current information with- similarity loss. Reconstruction loss is the MSE loss used in
out leveraging any previous information. However, in certain autoencoder whereas similarity loss is the Kullback-Leibler
domains, such as consecutive frames in a video current, infor- divergence between standard Gaussian distribution and latent
mation is highly correlated with the previous one. Recurrent space distribution.
neural networks (RNNs) allow previous outputs to be used as
inputs and can address this issue. E. GAN
LSTM [31], which allows information to persist, is a widely The GAN framework [34], as shown in Fig. 7, is a class of
used RNN architecture as shown in Fig. 5. The figure shows DL-based generative models. The GAN framework consists
6

Channel
Spatial
attention
attention
Yes module
Real image x module
Discriminator or
Real? No
z Generator
Generated Input feature Refined feature
Noise image X'
source Fig. 8. Illustration of an attention-based framework [39], including channel
and spatial attention modules
Fig. 7. Illustration of GAN architecture, including a generator, G, and a
discriminator, D
channel. The mask is not handcrafted but learned by another
of two NN-based submodels, namely, a generator G and a NN layer with new parameters. Fig. 8 shows an NN framework
discriminator D. The generator, G, generates new plausible (called CBAM) [39], which consists of channel and spatial
examples in the problem domain, whereas the discriminator, attention modules that are widely adopted in computer vision.
D, classifies whether the input examples are real or generated Channel attention focuses on determining which feature map
by the generator. The two submodels compete with each other is meaningful when the input feature has tens or hundreds
as in a game. of channels. The spatial dimension of the input feature is
The training of GANs is based on a game-theoretic scenario, squeezed by maximum- or average-pooling to generate a
in which the generator, G, competes against an adversary, that 1D vector, which is then forwarded to an NN module to
is, the discriminator, D. Concretely, the two modules need to produce a spatial attention map. Then, all input features maps
be trained jointly with two opposite goals at the same time: are multiplied by the corresponding weight in the generated
attention map. Spatial attention focuses on determining where
• The training target of generator G is to fool discriminator
the features are informative.
D, that is, maximizing the final classification error.
• The training target of discriminator D is to detect the
fake examples generated by the generator G, that is, IV. DL- BASED CSI F EEDBACK
minimizing the final classification error.
The opposite training goals force the two modules to try The existing research directions in DL-based CSI feedback
to beat each other, thereby simultaneously improving their can be divided into six categories. The first three categories,
performances. The final equilibrium state of the GAN training which include novel NN architecture design, multi-domain
corresponds to the situation where the generator, G, can gen- correlation utilization, and bitstream generation, focus on
erate data from the targeted distribution and the discriminator, improving the performance of DL-based CSI feedback. The re-
D, predicts “real” or “fake” with a probability 50% for all maining three categories, which include joint design with other
received examples. modules, practical consideration, and other related works,
focus on promoting the practical deployment of DL-based CSI
F. Attention Mechanism feedback.
RNNs, LSTM, and GRU are typically established for pro-
cessing sequential data, for example, language modeling and A. Novel NN Architecture Design
machine translation. Such networks consider computations
along the symbolic positions of the input and output se- Conventional ML requires careful domain expertise to de-
quences. The attention mechanism allows dependencies to be sign a feature extractor. The main advantage of DL is that
modeled regardless of their distance in the input or output se- features can be learned from substantial training samples using
quence, which has shown to achieve remarkable performance an end-to-end approach, that is, manually designing feature
improvements [35]. The attention mechanism is first applied extractors is not needed. However, NN architecture heavily
to the natural language processing domain to address the long affects the performance of DL-based algorithms and should
sequence problem in the machine translation task [36] and be carefully designed. Table I shows the normalized MSE
has been extended to other applications, such as computer (NMSE) performance of the feedback NNs using the dataset
vision [37]. From a cognitive science perspective, humans only published by [23]. Compression ratio (CR) represents the ratio
notice a portion of all visible information due to bottlenecks of the codeword dimension to the original CSI dimension. If
in information processing [38]. Bottlenecks inevitably exists CR is 1/16 for outdoor channels, the performance gap between
in NNs’ processing. Therefore, researchers propose a visual the CsiNet and the state-of-the-art NN is over 10 dB, which
selective attention model, that is, attention mechanism, to shows the great effect of the NN architecture on feedback
simulate the visual perception of humans. The key problem accuracy.
of the attention mechanism is how to find the key features to The existing NN design works of CSI feedback can be
capture long-range information interactions. divided into seven categories, as shown in Table II. Their key
In computer vision, the key features are identified by ideas are introduced, and a guideline for the future NN design
a mask, which quantifies the importance of each pixel or of CSI feedback is provided.
7

TABLE I
NMSE ( D B) PERFORMANCE OF THE NN S USING THE DATASET PUBLISHED BY [23].

CR 1/4 1/8 1/16 1/32 1/64


Scenarios Indoor Outdoor Indoor Outdoor Indoor Outdoor Indoor Outdoor Indoor Outdoor
CsiNet (Dec. 2017) [23] −17.36 −8.75 \ \ −8.65 −4.51 −6.24 −2.81 −5.84 −1.93
ConvCsiNet (2018) [40] −17.37 −8.98 \ \ −13.79 −6.00 −10.10 −5.21 −7.72 −4.48
CsiNet+ (June 2019) [41] −27.37 −12.40 −18.29 −8.72 −14.14 −5.73 −10.43 −3.40 −6.72 −2.45
Attention-CsiNet (Oct. 2019) [42] −20.29 −10.43 \ \ −10.16 −6.11 −8.58 −4.57 −6.32 −3.27
CRNet (Oct. 2019) [43] −26.99 −12.71 −16.01 −8.04 −11.35 −5.44 −8.93 −3.51 −6.49 −2.22
LSTM-Attention CsiNet (Jan. 2020) [44] −22.00 −10.20 \ \ −11.00 −5.80 −8.80 −3.70 −7.20 −2.40
DS−NLCsiNet (Aug. 2020) [45] −24.99 −12.09 −17.00 −7.96 −12.93 −4.98 −8.64 −3.35 \ \
DCGAN (Aug. 2020) [46] −26.20 −15.88 \ \ −13.50 −8.07 −9.00 −5.83 −6.45 −4.01
PRVNet (Nov. 2020) [47] −27.70 −13.90 \ \ −13.00 −6.10 −9.52 −4.23 −6.90 −2.53
CF-FCFNN (Jan. 2021) [48] −20.07 −11.61 −15.14 −10.08 −12.35 −9.12 −8.86 −8.42 −6.60 −7.25
CLNet (Feb. 2021) [49] −29.16 −12.88 −15.60 −8.29 −11.15 −5.56 −8.95 −3.49 −6.34 −2.19
ENet (May 2021) [50] −26.00 \ \ \ −14.50 \ −11.20 \ −7.50 \
DCRNet (June 2021) [51] −30.61 −13.72 −19.92 −10.17 −14.02 −6.35 −9.88 −3.95 \ \
CsiNet+DNN (June 2021) [52] \ \ \ −17.88 \ −14.70 \ −14.42 \ −11.34
MRFNet (July 2021) [53] −25.76 −15.95 \ \ −14.72 −9.49 −10.63 −7.42 −6.90 −6.52
ACCsiNet (Sep. 2021) [54] \ \ \ \ −14.59 −11.76 −11.00 −9.14 −7.46 −7.11
DFECsiNet (Dec. 2021) [55] −27.50 −12.25 −16.80 −7.90 −12.70 −5.20 −8.85 −3.35 −5.95 −2.10
TransNet (Feb. 2022) [56] −32.38 −14.86 −22.91 −9.99 −15.00 −7.82 −10.49 −4.13 −6.08 −2.62
CsiFormer (Feb. 2022) [57] \ \ \ \ \ \ −9.32 −3.51 −6.85 −2.25
CVLNet (March 2022) [58] \ \ \ \ −13.97 −6.67 −9.72 −4.56 \ \
“\” means the performance is not reported. The methods are ordered by their publication time.

1) Increasing Receptive Field: The CsiNet architecture adopted at the decoder to expand the receptive field to 15,
[23], shown in Fig. 9, is the first work that applies DL to which is half of the CSI size. Compared with directly adopting
CSI feedback. In this architecture, a convolutional layer is first a 15 × 15 convolutional operation, this can greatly reduce
employed at the encoder to extract the feature of the downlink the complexity, including the numbers of NN parameters and
CSI “images,” and the filter size is set as 3 × 3, which is the FLOPs determined by (17) and (18). The receptive field is
smallest in the commonly used ones. The receptive field size enhanced in [61] by stacking convolutional layers with 3 × 3
is 3 × 3, too. At the decoder, the RefineNet, which consists filters at the encoder to improve the quality of the features
of three series convolutional layers with 3 × 3 filters and extracted from the input CSI.
adopts residual learning [71], is employed to refine the initially 2) Multiple Resolutions: The above works, such as CsiNet+
reconstructed CSI. The receptive field size of each RefineNet [41], focus on enlarging the NN receptive field. However,
is 7 × 7, which is much smaller than the CSI “image” size, the CSI sparsity varies in different scenarios and even within
that is, 32 × 32, in [23]. different regions of a single CSI sample. As pointed out
As mentioned in Section III-B, the performance of CNN by [43], larger receptive field (or convolutional kernel) is
heavily depends on the size of the receptive field. Inspired by preferred for sparser regions, and the convolution operation
this observation, CsiNet+ [41] improves feedback performance with a small kernel can extract finer features much better.
by enlarging the receptive field size. From [41], the 3 × 3 Therefore, the CSI should be processed by the convolutions
receptive field, which is widely used to extract the edge with different kernel sizes, namely, multiple resolutions.
information, is not suitable for the CSI feedback task. A The CRNet architecture proposed by [43] first introduces
convolutional filter with a small receptive field cannot make the multi-resolution architecture to CSI feedback. The encoder
full use of the CSI sparsity in the angular-delay domain. and decoder in CRNet adopt a multi-resolution architecture.
Therefore, a convolutional layer with a much larger receptive Fig. 10 shows the encoder of CRNet and the main block at the
field is adopted in CsiNet+ architecture. Two convolutional decoder. The input CSI “image” passes through two parallel
layers with 7×7 kernel sizes replace the original convolutional NN paths. The left one consists of three stacked convolutional
layer at the encoder, and the receptive field sizes of the first two layers, namely, with 3 × 3, 1 × 9, and 9 × 1 convolution kernels.
convolutional layers in RefineNet are set as 7 × 7 and 5 × 5. The resolution (or receptive field size) of this path 11 × 11.
This modification improves the performance of the original The right path only consists of a convolutional layer with 3 ×
CsiNet. For example, NMSE is reduced from −17.36 dB to 3 convolution kernels. The resolution of this path is much
−20.80 dB when CR is 1/4. Based on [52], CsiNet+ does smaller than that of the left one. Then, the outputs of two
not work well when CR is low, such as 1/32 for outdoor paths with different resolutions are concatenated and merged
channels. Therefore, two FC layers are embedded after the by a 1×1 convolution operation. Finally, an FC layer is adopted
second convolutional layer in the RefineNet block, and more to reduce the dimension of the CSI. The decoder of CRNet is
RefineNet blocks are employed. Moreover, the Swish function similar to that of CsiNet and only replaces the RefineNet block
is used as the activation function. Inspired by [41], 5 × 5 with the CRBlock, as shown in Fig. 10. The CRBlock is based
and 8 × 8 convolution kernels are adopted at the encoder and on the encoder’s NN architecture but uses larger convolution
decoder in [59]. kernels and residual learning. The multiple resolution strategy
In [60], seven convolutional layers with 3 × 3 filters are greatly improves feedback performance. For example, when
8

TABLE II
N OVEL NN A RCHITECTURE D ESIGN

Key ideas NN name Main contributions in NN architectures


Replacing the original convolutional layer (3 × 3) at the encoder by two convolutional layers (7 × 7);
CsiNet+ [41]
Setting the first two convolutional layers in RefineNet as 7 × 7 and 5 × 5;
Embedding two FC layers after the second convolutional layer in the RefineNet block;
CsiNet+DNN [52]
Increasing Receptive Field Employing more RefineNet block;
MRNet [59] Setting the convolutional sizes of the encoder and the decoder as 5 × 5 and 8 × 8;
CS-ReNet [60] Stacking seven convolutional layers with 3 × 3 filters at the decoder;
BCsiNet [61] Stacking three 3 × 3 convolutional layers at the encoder to improve CSI feature quality;
The input CSI “image” passes through two parallel NN paths (11 × 11 and 3 × 3) at the encoder;
CRNet [43]
RefineNet is replaced with CRBlock that uses larger convolution kernels and residual learning;
DFECsiNet [55] Two feature extraction paths (7 × 7 and 3 × 3) are employed in parallel;
Multiple Resolutions
MSMCNet [62] The MSMC block has three parallel convolution paths (5 × 5, 7 × 7, and 9 × 9);
The MRFBlock has three parallel paths, with 5 × 5, 7 × 7, and 9 × 9 convolution kernels;
MRFNet [53]
The reason for the success of the multiple resolution strategy is revealed via feature visualization;
The dimension reduction of CSI at the encoder is achieved by average pooling operations;
ConvCsiNet [40]
The dimension increase is realized by bilinear interpolation at the decoder;
Fully Convolutional Layer DeepCMC [63] The convolution kernels (9 × 9 and 5 × 5) adopted are much larger than those in ConvCsiNet;
ACCsiNet [54] The asymmetric convolution block is adopted;
FullyConv [64] The dimension reduction is realized by a convolution operation by changing the stride;
Attention-CsiNet [42] Channel attention modules are introduced to the decoder of the CSI feedback;
SALDR [65] The encoder adopts a novel spatial attention mechanism, that is, patch-wise self-attention;
Attention Mechanism
CsiTransformer [66] A single-layer transformer module replaces the traditional convolution operation;
TransNet [56] A more powerful two-layer transformer architecture is adopted;
DCGAN [46] A GAN architecture is introduced to the training process;
GAN and VAE
PRVNet [47] VAE is introduced to CSI feedback;
The CSI matrix is first transformed to angular-delay domain;
CsiNet [23] Some rows, whose valuse are all close to zero, are removed;
The real and imaginary parts of truncated CSI are concatenated and then scaled in [0, 1];
Well-designed Preprocessing [67] Channel element magnitude is set as 𝐴 if it is over a predefined threshold 𝐴;
CLNet [49] The real and imaginary parts are embedded in physical meaning by a 1 × 1 convolution;
ENet [50] The real and imaginary parts are fed back to the BS separately with the same autoencoder;
P-SRNet [68] The rows full of near-zero elements are omitted;
The CSI matrix is compressed by an FC layer;
TiLISTA-Joint [69]
The ISTA algorithm is unfolded, and hyperparameter is learned by NNs;
The fast ISTA algorithm is unfolded;
FISTA-Net [70]
Others The original CSI is represented by a basis and a residual part of the column space channel matrix;
CF-FCFNN [48] The feedback NN architecture only consists of FC layers.

Encoder Decoder
1 1 2 RefineNet
2 1 8 16 2 2
32×32
32
FC 32×32×2
Reshape N×1

eNet
Conv 3×3
Conv 3×3
3×3

3×3
M×1
3×3
dellay

feedback
32

Conv 3

Conv 3
Reshape 3

Refine
FC M
Conv 3

H Ĥ
angular

User BS

Fig. 9. Illustration of the CsiNet architecture, in which the encoder compresses downlink CSI and the decoder reconstructs CSI from the feedback information.
The encoder and decoder consist of convolutional and FC layers.

CR is 1/4 for the outdoor scenario, NMSE can be reduced The NN architecture in [53], called MRFNet, reveals the
from −8.75 dB to −12.71 dB with minimal increase in NN reason for the success of the multiple resolution strategy via
complexity. feature visualization. The encoder in MRFNet is the same
An improved NN architecture, called DFECsiNet,is pro- as that in CsiNet, and the main modifications are employed
posed in [55] for CSI feedback. The key component of DFEC- to the decoder at the BS. The MRFBlock has three parallel
siNet is DFEBlock, in which two feature extraction paths are paths, with 5 × 5, 7 × 7, and 9 × 9 convolution kernels. The
employed to extract the different resolution diversity of CSI feature number of each path is 64, which is much smaller than
in parallel. The two paths have different resolutions, 7 × 7 other works. Other architectures of MRFBlock are similar to
and 3 × 3. Similar to [43], residual learning is adopted in the that of the CRBlock. From feature visualization, different CSI
DFEBlock at the decoder. The NN architecture in [62], called features can be learned by the convolution operations with
MSMCNet, adopts a novel multi-scale and multi-channel different kernel sizes. The convolution operation with a small
(MSMC) block based on the CRBlock [43]. The MSMC block kernel, such as 5 × 5, focuses on extracting the background
has three parallel convolution paths with different receptive or pattern information. The one with a large kernel, such as
field sizes, 5 × 5, 7 × 7, and 9 × 9. 9 × 9, is good at extracting values located in distinct regions.
9

Encoder CRBlock
32x32x2 3x3 conv
32x32x2
H
32x32x2
Conv 3x3
32x32x2
Conv 3x3
+
32x32x2 32x32x7 32x32x7 1x3 conv
Conv 1x9 Conv 1x9 CONV 1x5 =
32x32x2 32x32x7 32x32x7
Conv 9x1 CONV 3x3 Conv 9x1 CONV 5x1
32x32x2
32x32x2
32x32x7
32x32x7
+
Concat Concat
32x32x4 32x32x14
3x1 conv
Conv 1x1 Conv 1x1
32x32x2 32x32x2
FC Add
Mx1
32x32x2 Fig. 12. Illustration of asymmetric convolution block in [54]

Fig. 10. Encoder and CRBlock of CRNet proposed in [43] with are two
parallel paths, each with different resolutions
[72], as shown in Fig. 12, is introduced to enhance the CSI
feature extraction of the convolution operation in [54]. The
ECN asymmetric convolution block consists of three parallel layers
with 3×3, 1×3, and 3×1 convolution kernels, and the outputs
Averaage pooling
g

are then summed up. This block enriches the feature space
onv 3×3
onv 3×3

ECN

ECN

ECN

Ĥ compared with the standard 3 × 3 convolution operation.


16×16×1128
16×16×664

E
32×32×664

2×2×M//4
8×8×256

4×4×5122

FullyConv in [64] is also based on fully convolutional


Co
Co

32×32×2
layers. Different from [40], [54], [63], the dimension reduction
in [64] is realized by convolution operation. In FullyConv, the
convolution layer with the stride set as 4 × 4 × 4 can achieve
Fig. 11. Encoder architecture in ConvCsiNet [40], in which dimension a CR of 1/64. A transposed convolution (or deconvolution)
reduction is achieved by average pooling rather than reducing the neuron layer with the same stride can restore the original size of the
number of FC layers
CSI at the decoder.
4) Attention Mechanism: The attention mechanism is first
3) Fully Convolutional Layer: The existing works extract introduced to DL-based CSI feedback by Attention-CsiNet
CSI’s features by convolutional layers and compress and proposed in [42]. As mentioned in Section III-F, the im-
reconstruct the CSI by FC layers. Dimension reduction and portance of different feature maps is different. Therefore,
increase of CSI are achieved by adjusting the neuron numbers NN performance can be improved if more attention is paid
of the last FC layer of encoder and the first FC layer of to the feature maps with more information. Based on this
decoder, respectively. observation, channel attention modules are introduced to the
Different from the existing works, ConvCsiNet proposed in decoder of the CSI feedback in [42], [73], and [74]. Fig.
[40] is based on convolutional layers without FC layers. Fig. 13 shows the channel attention module and the Attention
11 shows the encoder architecture of ConvCsiNet. The key RefineNet block proposed in [42]. The goal of the attention
component of the encoder is the encoded convolution network module is to generate a vector to describe the importance of
(ECN) block, each of which consists of an average pooling each feature map. First, a global average pooling is adopted to
layer and a convolutional layer. The dimension reduction of generate an 𝐿 × 1 vector. Then, two FC layers are employed to
CSI is achieved by average pooling operations, each of which reconstruct the importance vector. The activation function of
reduces the CSI size by half. Therefore, four serial ECN the last layer is Sigmoid to guarantee that all vector values are
reduces the CSI dimension from 32 × 32 to 2 × 2. The number in the range (0, 1). Finally, the generated vector is multiplied
of feature maps after each convolution operation is quite large, by the input feature maps. NN can work better because more
and the last one is 𝑀/4, where 𝑀 is the codeword length. The useful information can be highlighted and extracted with an
decoder of ConvCsiNet is also based on convolutional layers attention vector. The Attention RefineNet block is similar to
and the dimension increase is realized by bilinear interpolation. the original RefineNet used in CsiNet [23] but introduces an
ConvCsiNet [40] can greatly improve CSI feedback accuracy extra attention module.
when CR is low, such as 1/32 and 1/16. Moreover, fully The encoder in [65] adopts another kind of attention mech-
convolutional architecture is flexible to the dimension of input anism, that is, spatial attention. The original spatial attention
CSI [63]. generates attention weights for each “pixel” in the feature
Similar to ConvCsiNet, DeepCMC [63] and ACCsiNet [54] map. However, the correlation among the adjacent “pixels”
compress and reconstruct the CSI by down sampling and is ignored. To solve this problem, a novel spatial attention
up sampling layers, respectively. Moreover, the convolution mechanism, that is, patch-wise self-attention [75], is adopted
kernels (9 × 9 and 5 × 5) adopted in [63] are much larger in [65]. The key idea of this attention mechanism is to limit
than those adopted in [40]. The asymmetric convolution block the scope of the original spatial attention to a local patch rather
10

32x32xL
Attention Attention
module module
Global average

3×3
3×3

3×3

3×3
pooling

32x32xx2
1x1xL

Conv 3
Conv 3

Conv 3

Conv 3
32x32x16
32x32x16
FC (ReLU)

32x32x16

32x32x8

32x32x8

32x32x2

32x32x2
1x1xL/2 0
FC (Sigmoid)
Encoder Decoder Discriminator 1

3
1x1xL

Original Reconstructed
Attention module Attenttion RefineNet
CSI CSI
Only training
Fig. 13. Attention module and Attention RefineNet architecture proposed in
[42]
Training and Inference

Fig. 14. Illustration of GAN architecture (called DCGAN) proposed in


than the entire feature map, thereby not only further improving [46]. The autoencoder and discriminator D are jointly trained. However, the
feedback performance but also reducing the complexity of the discriminator is not needed during inference.
attention module. Moreover, the decoder in [65] uses a novel
RefineNet, named dense RefineNet, in which each NN layer
passes its feature maps through all subsequent NN layers [76].

imaginary part
The attention mechanism in [42], [65] is based on CNN
architecture. However, the transformer architecture [35] com-
pletely abandons the traditional CNN and RNN architec-

truncated
tures, and only relies on the attention module to eschew 2D-FFT

Nc (subcarriers)

……

……

real part
recurrence. The transformer architecture is applied to CSI

Nc (delays)
feedback by [66], in which the transformer module replaces
the traditional convolution operation at the encoder and the
RefineNet at the decoder. However, performance is slightly
split
improved. The authors of [56] stated that CsiTransformer
in [66] cannot fully utilize the transformer’s power because

N’c
only a single-layer transformer architecture is used. Therefore,
a more powerful two-layer transformer architecture, namely,
TransNet, is proposed. On the basis of fully excavating the
power of the transformer, feedback performance is greatly Nt (antennas) Nt (angles) Nt
improved. Convolutional transformer architecture is adopted
by CsiFormer [57] to maintain long-range dependency of CSI. Fig. 15. CSI preprocessing workflow in [23]. Preprocessing consists of three
steps: 2D-DFT, truncation, and splitting.
5) GAN and VAE: The training and inference of the above
works are based on the autoencoder architecture proposed
by CsiNet [23]. Unlike these works, novel NN frameworks,
method introduces extra performance improvement compared
namely, GAN and VAE, are introduced in [46] and [47] to
with the traditional VAE-based CSI feedback.
DL-based CSI feedback.
Fig. 14 shows the GAN architecture (called DCGAN) in 6) Well-designed Preprocessing: Preprocessing is essential
[46]. An extra discriminator D is added after the autoencoder. in data science. Preprocessing of CSI samples affects the
The method can help generate more plausible CSI compared performance of DL-based CSI feedback. Fig. 15 shows the
with other DL techniques. During inference, the discriminator preprocessing in [23], which has been adopted by most
is not needed any more, and only the encoder and decoder existing works. The estimated downlink CSI matrix is first
are deployed to practical systems. This method can improve transformed to angular-delay domain by a 2D-DFT operation.
NN performance without changes to the original autoencoder- The sparsity characteristic of CSI holds in this domain. Given
based inference architecture. Therefore, it can be easily intro- that the time delays between multi-path arrivals lie within a
0
duced to the above works. rather limited period, only the first 𝑁c rows contain values
0
The novel NN architecture based on VAE, called PRVNet, that are not close to zero. Therefore, only the first 𝑁c rows
is introduced in [47]. As in Section III-D, the loss function are retained, and the remaining are removed. DL libraries,
of the traditional VAE is defined as the sum of reconstruction such as TensorFlow [77] and PyTorch [78], only can build
loss and similarity loss. However, this loss function is not real-valued NNs. Thus, the real and imaginary parts of the
suitable for the DL-based CSI feedback problem. Therefore, complex truncated CSI are concatenated to formulate a real-
reconstruction loss occupies a major position in the whole valued 3D matrix. Last, each 3D matrix is scaled in [0, 1]
loss function. A parameter 𝛽 ∈ (0, 1) is introduced in [47] to by
00
emphasize the importance of the reconstruction loss as
 
1 H𝑖
Hnorm = + 1 . (21)
2 𝑚𝑎𝑥 𝑎𝑏𝑠( [H000 , . . . , H00𝐾 ])
𝑖 
𝑙 = 𝑙rec + 𝛽 · 𝑙 dis , (20)
where 𝑙rec and 𝑙 dis represent the reconstruction loss of CSI The activation function of the last NN layer in CsiNet [23]
and the similarity loss of the distribution, respectively. This is the Sigmoid defined in (11). Moreover, according to [23],
11

H
s

1-th t-th T-th


module module module
M 1 Ĥ (0)
Ĥ (1)
Ĥ (t-1)
Ĥ (t)
Ĥ (T-1)
Ĥ (T)

2 Nt N c 1

r (t)

Ĥ (t-1)
- Ĥ (t)
N c  2 Nt 2 Nt 1
2 Nt N c 1 N c  2 Nt
2 Nt N c 1
2 Nt N c 1 Reshape operation
Take vectors of length 2 Nt in
Dense;Linear order by rows
s Top k activations,
k=51
Vectors stacked into a matrix

Fig. 16. Entire deep unfolding structure of TiLISTA-Joint in [69]

whether to transform CSI of the spatial domain into the angular From (23), the real and imaginary parts of the neighboring
domain has no effect on the CsiNet’s performance. CSI elements are entangled, and nine complex CSI values
Data normalization heavily affects the performance of DL, are interpolated as a synthesized one, resulting in the loss of
including accuracy and training complexity [79]. Therefore, a original physical meaning. CLNet overcomes this problem by
simple yet efficient data normalization method with clipping replacing the original 3 × 3 convolution operation by a 1 × 1
is proposed in [67] for DL-based CSI feedback. Some channel convolution operation, in which the real and imaginary parts
elements may have very high power, which affects the statis- can be embedded in physical meaning as
tical operation of DL and can be regarded as outliers [67].
Hence, if a channel element has a magnitude over a threshold 𝑖1 (1, 1) = [𝑎 1 ] · [𝑤 1 ] + [𝑏 1 ] · [𝑤 1 ], (24)
𝐴, its magnitude is set as 𝐴, and its phase remains the same.
where the ratio between the real and imaginary parts (𝑎 1 and
Then, the clipped CSI matrix is scaled to [0, 1]. This kind
𝑏 1 ) is preserved; thus, phase information is preserved. An
of preprocessing can greatly improve NN performance and
ablation study shows that a 1 dB reconstruction gain can be
accelerate the convergence of NN training.
achieved when CR is 1/16 for the indoor scenario. Inspired
The CLNet proposed in [49] considers that the CSI is in
by CLNet, CVLNet proposed in [58] introduces a complex-
the form of complex values with its physical meaning. The
valued NN to realize the CSI compression and reconstruction
previous works overlook this problem and directly concatenate
and all operations in CLNet follow the law of complex number
the real and imaginary parts of the CSI together, inevitably
computation.
resulting in performance loss. The first layer in the previous
The ENet proposed in [50] feeds back the real and imag-
works is a convolutional layer with 3 × 3 or larger kernels.
inary parts separately with the same autoencoder. From [50],
Here, a 3×3 convolution operation is used as an example. X ∈
the real and imaginary parts of the complex-valued CSI
R𝑚×𝑚×2 is a 3D tensor that is extended from its 2D version by
share the same distribution. Based on this observation, the
concatenation operation. I = [i1 , . . . , i𝐶 ] ∈ R𝑚×𝑚×𝐶 represents
autoencoder trained with the real parts of the CSI samples can
the feature maps after convolution operation, and 𝐶 is the
be used to compress and reconstruct the imaginary parts of the
channel number of the feature maps. 𝑎 𝑛 + 𝑏 𝑛 i (𝑛 = 1, 2, 3)
CSI samples, which can greatly reduce the NN complexity.
represents a 3 × 3 patch in X, and 𝑤 𝑛 denotes the weight of
the convolution operation. The 3 × 3 convolution operation on P-SRNet [68] introduces a principal component mark
this patch can be essentially formulated as the sum of two (PCM) module before the encoder. Only partial rows in the
multiplication processes as3 truanted CSI, such as the CSI “images” in Figs. 14 and 15,
have non-zero elements. Therefore, the rows full of near-zero
𝑖1 (1, 1) = [𝑎 1 , . . . , 𝑎 9 ] · [𝑤 1 , . . . , 𝑤 9 ] T + [𝑏 1 , . . . , 𝑏 9 ] · [𝑤 1 , . .elements
9]
. , 𝑤(22)T are omitted. The square of the Euclidean norm of

= [𝑎 1 + 𝑏 1 , . . . , 𝑎 9 + 𝑏 9 ] · [𝑤 1 , . . . , 𝑤 9 ] T . each(23)CSI row is first calculated. The rows with Euclidean


norms below a threshold are selected and do not need to be
3 The bias terms of convolution operation are omitted in this part for fed back. As shown in (14-18), NN complexity, including
simplicity. the numbers of weights and FLOPs, increases with input
12

TABLE III
M ULTI - DOMAIN C ORRELATION U TILIZATION

Correlation types NN name Main contributions in correlation utilization


The first CSI “frame” is compressed by a high-CR encoder with the highest quality;
CsiNet-LSTM [85] The other 𝑇 − 1 CSI “frames” are compressed by low-CR encoders;
LSTM refines the reconstructed CSI with the information extracted from the former CSI;
Time Correlation RecCsiNet [86] The LSTM at the encoder compresses the CSI based on the current and previous CSI matrices;
Feedback overhead is reduced by dynamically adjusting feedback interval of time varying channel;
[87]
Feedback is not needed if prediction errors are tolerable;
The CSI magnitude and phase is fed back separately;
DualNet-MAG [88]
Partial Bidirectional Uplink CSI magnitude is introduced into the downlink CSI magnitude reconstruction at the decoder.;
Channel Correlation UA-CsiConvLSTM [89] The initial recovered downlink CSI is concatenated with the entire uplink CSI for further reconstruction;
HyperRNN [90] The partial bidirectional correlation is utilized by adopting hypernetworks;
Attention-CsiNet [42] Bi-LSTM module is adopted to extract the subcarrier correlation to compress CSI;
Frequency Correlation
SampleDL [91] The original channel is uniformly sampled in the frequency domain before feedback;
Two nearby users cooperatively feed their CSI magnitudes back to the BS;
CoCsiNet [92] The information contained in CSI magnitude is divided into individual and shared information;
Correlation Among
The final CSI is reconstructed from the recovered individual and shared information;
Nearby Users’ CSI
The CSI magnitude and phase are fed back together;
Distributed DeepCMC [93]
A joint feature decoder reconstructs the CSI of two users.;

dimension. The drop of partial rows reduces the dimension gradient descent algorithm. Sparse transformation operations,
of the input to NNs. Hence, this strategy can greatly reduce 𝑓𝑠 (·) and 𝑓 𝑑( ·) , are realized by two FC layers without bias
NN complexity. Additionally, a binary indicating vector needs terms. The TiLISTA-Joint architecture outperforms the tradi-
to be sent to the BS to guarantee the decoder to reconstruct a tional iterative algorithms and CsiNet by a large margin. The
CSI with the same dimension as the original CSI, that is, to fast ISTA algorithm is unfolded in [70] with a two-stage low-
indicate which rows have been omitted. rank feedback scheme, in which the original CSI is represented
7) Others: by a basis and a residual part of the column space channel
a) Deep unfolding architecture: Most of the existing matrix.
works are based on an autoencoder architecture, which is b) Fully FC architecture: As mentioned in Section
data-driven and lacks enough theory explanations [80], [81]. IV-A3, some works try to realize CSI feedback fully by
However, the data-driven method performs better than the convolutional layers. By contrast, a feedback NN architecture
model-driven method, which introduces expert/domain knowl- (called CF-FCFNN) in [48] only consists of FC layers. The
edge into the model constraints. Deep unfolding [82] unfolds CF-FCFNN architecture based on FC layers can extract spatial
the inference iterations as NN layers and unties the model features more sufficiently compared with CsiNet based on con-
parameters via end-to-end learning. It is a combination of data- volutional layers and greatly outperforms CsiNet, especially
driven and model-driven methods, and has been regarded as a when the feedback difficulty is high. For example, when CR
potential direction in wireless communications. is 1/64 for the outdoor scenario, the NMSEs of CsiNet and
Deep unfolding for CSI feedback is first introduced in [69]. CF-FCFNN are −2.02 and −7.25 dB, respectively. However,
Fig. 16 shows the TiLISTA-Joint architecture proposed in [69]. the NN parameter number of CF-FCFNN is rather large.
In the tied ISTA algorithm, the reconstruction problem can be
solved by the following iterative formulas [25], [83]
r (𝑡) = Ĥ (𝑡−1) + 𝜌 (𝑡) W s − 𝑔( Ĥ (𝑡−1) ) ,

(25) B. Multi-domain Correlation Utilization
 
(𝑡) (𝑡) (𝑡) 
Ĥ = 𝑓 𝑑 𝑠𝑜 𝑓 𝑡 𝑓𝑠 (r ); 𝜃 , (26) In Section IV-A, some methods to improve CSI feedback
by introducing novel NN architectures are discussed. Different
where s ∈ R𝑚×1 and Ĥ (𝑡) ∈ R2𝑁t 𝑁c ×1 represent the code- from the images in computer vision, the CSI “images” con-
word and the output of the 𝑡-th iteration4 , respectively; 𝑓𝑠 (·) tain rich information about geometrical wireless propagation,
and 𝑓 𝑑 (·) denote the sparse transformation and the inverse which can be exploited to further improve CSI feedback.
transformation, respectively; 𝑠𝑜 𝑓 𝑡 (·; ·) 5 represents the soft- Multi-domain correlations have been adopted by the con-
thresholding function; 𝑔(·) stands for the NN-based compres- ventional CSI feedback methods. For example, a distributed
sion operation that reduces CSI dimension from R2𝑁t 𝑁c ×1 to CSI acquisition framework is developed in [19] to utilize the
𝑀 × 1; 𝜌 (𝑡) and W are the step size of the 𝑡-th iteration and a joint sparsity structure in downlink CSI matrices owing to the
linear operator, respectively. The original signal in a sparse shared local scatters of the physical propagation environment.
domain is reconstructed by the soft-thresholding function The partial channel reciprocity-based CSI feedback codebook
[84]. In traditional iterative algorithms, the hyperparameters in [94] exploits that bidirectional channels have a similar
of tied ISTA, namely, 𝜌, W, 𝑔(·), 𝜃, are selected by a time- angular-delay distribution. Inspired by these works, multi-
consuming grid search. In TiLISTA-Joint architecture, these domain correlations, including time correlation, partial bidi-
parameters are learned by an end-to-end approach, that is, rectional channel correlation, and correlation among nearby
users’ CSI, have been also introduced for DL-based feedback,
4 The initial value Ĥ (0) is set as zero because of the CSI sparsity.
5 The soft-thresholding function can be written as 𝑠𝑜 𝑓 𝑡 ( 𝑥; 𝜃) = as shown in Table III. In this part, how to embed these
sign( 𝑥)max(0, |𝑥 | − 𝜃), where 𝜃 denotes the shrinkage threshold. correlations into DL-based CSI feedback is briefly introduced.
13

Encoder (user)

delay
……
angle

CsiNet Shared CsiNet CsiNet CsiNet


encoder parameters encoder encoder encoder

codeword M1 ×1 M1 ×1 M2 ×1 M1 ×1 M2 ×1 …… M1 ×1 M2 ×1

CsiNet Shared CsiNet CsiNet CsiNet


deocder parameters deocder deocder …… deocder

LSTMs LSTMs LSTMs LSTMs

……

Decoder (BS)

Fig. 17. Overall architecture of CsiNet-LSTM [85]. The first channel H1 and other 𝑇 −1 channels are compressed by high- and low-CR encoders, respectively.
The output of the encoder, that is, codeword, is concatenated with the first one before being sent to the decoder. The initially reconstructed CSI is then refined
by the LSTM modules.

of a video can be compressed together to save storage space.


However, the user feeds back the estimated downlink CSI
2
Reshape 32×32×2

successively.
F
FC Rx
Conv 3×3

The novel NN architecture in [85], called CsiNet-LSTM,


delay

N
H M utilizes the time correlation to help CSI reconstruction at the
angle decoder by LSTMs, as shown in Fig. 17. The basic encoders
Input LSSTM
and decoders in CsiNet-LSTM are the same as those in CsiNet.
Instead of feeding back a single CSI “image,” the CsiNet-
Feature extraction Feature compression
c
LSTM is designed for a sequence of CSI matrices. For a CSI
sequence with length 𝑇 6 , the first CSI “frame” is compressed
by a high-CR encoder with the highest quality, and other
Fig. 18. Overall architecture of RecCsiNet encoder [86], which consists of
feature extraction and compression modules 𝑇 − 1 CSI “frames” are compressed by low-CR encoders. The
low-CR encoders share the same NN parameters. The first
CSI, H1 , is used as a reference to help the reconstruction
1) Time Correlation: In a time-varying scenario, the user of the remaining CSI. Therefore, the last 𝑇 − 1 codewords
location is not fixed. However, the user’s moving distance are concatenated with the first one. The CsiNet-based de-
within a short time (for example, feedback interval) is small. coders recover the CSI from the codewords initially. Then,
For a user with a moving speed of 360 km/h, the moving the initially reconstructed CSI is fed into a three-layer LSTM
distance within 1 ms is only 0.1 m. Therefore, the environ- module. The LSTM module refines the reconstructed CSI with
ment around the user does not fully change. The channel is the information extracted from the former CSI. The simulation
determined by the propagation environment; thus, the CSI at results show that CsiNet-LSTM has the least performance loss
adjacent slots exhibits a high correlation. To model a CSI time with the decrease of CRs compared with CsiNet.
evolution, the first-order Markov process can be adopted as CsiNet-LSTM only embeds LSTM into the decoder to
[95], [96]. exploit time correlation. However, no changes are introduced
to the encoder of CsiNet-LSTM, that is, time correlation is
√︁
H𝑡 = 𝛼H𝑡−1 + 1 − 𝛼2 G𝑡 , (27)
ignored during compression. The RecCsiNet in [86] exploits
where 𝛼 ∈ [0, 1) represents the temporal correlation coefficient LSTM to enhance the compression and reconstruction of a
between the adjacent channels, and G𝑡 denotes a zero-mean time-varying channel. Fig. 18 shows the encoder architecture
and unit-variance complex Gaussian matrix. 𝛼 → 1 generates of RecCsiNet. The encoder consists of two modules: feature
a time-invariant CSI matrix, and 𝛼 = 0 represents that the CSI extraction and compression. Similar to the CsiNet encoder, a
has no time correlation. Considering the time correlation, the convolutional layer with 3 × 3 kernels is first used to extract
CSI of the time-varying scenario can be regarded as sequence the feature of downlink CSI. The feature compression contains
data, such as video. However, time-varying CSI cannot be
compressed fully the same as the video. The adjacent frames 6 The sequence length 𝑇 is set as ten in [85].
14
DualNet-MAG

Encoder Decoder
Magnitude
32×32 32×32 1024×1 M×1 M×1 1024×1 2048×1 32×32×2 32×32
feedback Recovered
Residual g
magnitude
Network
32×32×2 32×32×2
32×32

Encoder (UE)

delay
Downlink Recovered
CSI Uplink
concatenation
downlink CSI
H1 H 1024×1
magnitude
2 H3 …… HT
angle feedback
Magnitude‐based Quantized
phase
phase quantization
32×32 32 32
32×32
Phase CsiNet-LSTM Encoder
: 3×3 Conv : Reshape : FC

Fig. 19. Overall architecture of DualNet-MAG [88], which feeds back the CSI magnitude and phase. Uplink CSI magnitude is introduced into the reconstruction
of the downlink CSI magnitude at the decoder.

M1 ×1 M1 ×1 M2 ×1 M1 ×1 M2 ×1 …… M1 ×1 M2 ×1
codeword

CsiNet Decoder

……

ConvLSTM ConvLSTM ConvLSTM ConvLSTM


ConvLSTM ConvLSTM ConvLSTM …… ConvLSTM
ConvLSTM ConvLSTM ConvLSTM ConvLSTM

……

Fig. 20. Decoder architecture of UA-CsiConvLSTM [89], which introduces the time and partial bidirectional correlations to the CSI feedback

two parallel paths: a linear FC layer and an LSTM module. each CSI feedback. A prediction NN in [87], which produces
As described in (27), the current CSI contains the information the current CSI based on the knowledge of the past CSI
of the previous one. The transmission bandwidth is wasted if sequence, is shared by the user and the BS. If prediction
the shared information is fed back repeatedly. Therefore, the errors are tolerable, the user does not need to feed back the
LSTM module compresses the CSI based on the current and current CSI, and the BS directly uses the CSI produced by the
previous CSI matrices. The FC layer, which can be regarded shared prediction NN. However, feedback is needed when the
as a jump connection, is used to accelerate the convergence difference is over a predefined threshold. Numerical simulation
of NN training. The decoder of RecCsiNet consists of two shows that the MSE is reduced by 19.9% compared with the
modules, feature uncompression and channel recovery. The regular feedback strategy.
feature uncompression module performs the inverse of the 2) Partial Bidirectional Channel Correlation: Downlink
feature compression module. Then, the RefineNet proposed CSI cannot be inferred from uplink CSI in frequency-division
by [23] is used to improve reconstruction accuracy. Based on duplex (FDD) systems because the operating frequencies of
RecCsiNet architecture [86], a novel NN architecture, called the downlink and uplink are different. However, the signal
ConvlstmCsiNet, in [97] replaces the feature extraction at the propagation environment is the same for the downlink and
encoder and the RefineNet module at the decoder with more uplink. Therefore, the bidirectional channels hold a partial
powerful NNs, thereby improving CSI feedback performance. correlation [94]. The accuracy of the reconstructed downlink
An attention module is added after the LSTM of the feature CSI becomes better if uplink CSI is exploited.
compression/decompression block in RecCsiNet in [98]. The high correlation between the magnitudes of the bidi-
rectional channels is exploited in [88]. Fig. 19 depicts the
Unlike [85], [86], [97], the feedback overhead is reduced DualNet-MAG framework proposed by [88]. The quantized
in [87] by dynamically adjusting the feedback interval of CSI phase is directly fed back via the uplink control channel.
the time-varying channel instead of reducing the overhead of By contrast, the CSI magnitude is compressed by an NN-
15

Codeword

CSI magnitude
Encoder Decoder Combination
avg

LSTM LSTM LSTM Individual


module module module User 1
LSTM LSTM LSTM
module module module Decoder

Shared

CSI magnitude
h1 h2 h Nc
Encoder
Decoder Combination
Fig. 21. Encoder architecture of Attention-CsiNet [42], which utilizes the
correlation among adjacent subcarriers by Bi-LSTM modules
Individual
User 2
BS
based encoder. Once receiving the feedback codeword, the BS Fig. 22. Overall architecture of CoCsiNet [92], which consists of individual
concatenates the codeword with the corresponding uplink CSI and shared decoders to recover the individual and shared information of the
magnitude and sends the concatenated vector to the decoder, users’ CSI magnitude
which reconstructs the downlink CSI with the information not
only from the feedback codeword but also from the uplink CSI
magnitude. The simulation results show that the introduction CsiNet in [23] ignores the correlation between subcarriers.
of uplink CSI magnitude can greatly improve feedback ac- Therefore, the Bi-LSTM module is adopted to extract the
curacy. However, this feedback strategy [88] results in a bit- subcarrier correlation to compress CSI as shown in Fig. 21.
allocation between the magnitude and phase feedback. The The final codeword is the average of the output of two
original loss function in [88] is modified in [99] and [100] LSTM modules. In CsiNet-LSTM [85], only a unidirectional
by directly introducing the phase and magnitude to the MSE LSTM is adopted because the current CSI is reconstructed
function of CSI reconstruction to ensure an end-to-end training with the help of the previous CSI but without the help of
of CSI phase and magnitude feedback. the next moment CSI. However, the channel feedback over
A novel feedback framework in [89], called UA- a certain subcarrier can be enhanced by channels over all
CsiConvLSTM, exploits time and partial bidirectional correla- other subcarriers. Moreover, two LSTM modules in Bi-LSTM
tions, as shown in Fig. 20. The encoder part is the same as that share the same NN weights, thereby dramatically reducing NN
of the CsiNet-LSTM [85]. Upon receiving the codeword, the weight numbers.
BS initially reconstructs the downlink CSI with the decoder For the novel compressive samples CSI feedback framework
of the CsiNet [23]. Then, the recovered downlink CSI is in [91], called SampleDL, the original channel is uniformly
concatenated with the uplink CSI. The concatenated vectors sampled in the frequency domain before feedback. Only
are sent to a three-layer ConvLSTM block, which utilizes channels over selected subcarriers are fed back to the BS. The
time and partial bidirectional correlations to refine the initial autoencoder compresses and reconstructs the sampled CSI.
downlink CSI. This strategy forces the NNs to learn and Then, the reconstructed CSI is interpolated with 0 to recover
exploit the correlation automatically, thereby preventing the its original dimension, and an extra NN is adopted to refine
bit allocation between the magnitude and phase feedback in the interpolated CSI. This method can increase the feedback
[88]. accuracy and reduce the NN complexity due to the reduction
The HyperRNN in [90] utilizes the partial bidirectional cor- of NN input.
relation by adopting hypernetworks [101] instead of directly 4) Correlation Among Nearby Users’ CSI: The user num-
sending the uplink CSI to the decoder as [88], [89]. The key ber increases substantially in 6G. As pointed out in [102], user
idea of hypernetworks is to use a single network (called as density may grow to hundreds per cubic meter, which poses
hypernetwork) to generate the weights for another NN. The a high requirement of spatial spectral efficiency. Based on
estimated uplink CSI is sent to an FC layer to generate the practical measurements in [103], channel correlation is higher
weights of the NNs used to reconstruct the downlink CSI. than 0.48 for all close-by users. The far-away users have an
The hypernetwork introduces uplink channel information into inter-user CSI correlation that is more than twice higher than
downlink channel reconstruction through these generated NN that of the i.i.d. CSI, even when the distance of the users
weights. is over tens of wavelengths 7 . Based on this observation, the
3) Frequency Correlation: The channels over adjacent sub- CSI correlation among nearby users can be utilized to improve
carriers are highly correlated. Therefore, some works extract feedback accuracy and reduce feedback overhead. Inspired by
and utilize this correlation to further reduce feedback over- this, CoCsiNet [92] and distributed DeepCMC [93] introduce
head. this correlation to DL-based CSI feedback.
The Attention-CsiNet proposed in [42] adds the LSTM
modules to the encoder and decoder. As pointed out in [42], 7 The frequency in [103] is 2.4 GHz, and the wavelength is 12.5 cm.
东南大学硕士学位论文
16

CR=4
Fig. 22 shows the framework of the CoCsiNet proposed by 15
[92]. Inspired by [88], CoCsiNet feeds back the CSI phase and
magnitude, respectively. Two nearby users cooperatively feed
10 10

Number
their CSI magnitudes back to the BS due to the observation
of correlation in the CSI magnitude domain. The information
5 5
contained in CSI magnitude is divided into two kinds in [92]:
0 0
individual and shared information. The encoders of the nearby -0.6 -0.4 -0.2 0 0.2 0.4 0.6
users compress and quantize the CSI to generate a bitstream. Codeword value -0.6 -0.4
Then, two different decoders are adopted to reconstruct the
Fig. 23. Distribution map of the compressed CSI values when CR is set as
CSI from the feedback bitstream. The first decoder focuses 1/4 for the indoor scenario [106].
on recovering the individual information in the user CSI.
The shared decoder can recover the information shared by
CR=8
two users. The final CSI is reconstructed from the recovered 10 10

Decoder
Encoder
delay
 -law
individual and shared information. The information shared by quantizer

nearby users does not need to be fed back any more, thereby angle
reducing feedback overhead. Two magnitude-dependent meth- 5 5
Offset NN
ods introduce instant and statistical information of the CSI
magnitude into the feedback of the CSI phase to feedback the 0
Fig. 24. Bit-level autoencoder-based CSI feedback framework [41], [106], 0
CSI phase efficiently. Visualization of the encoder parameters which adopts a 𝜇-law non-uniform quantizer. Upon receiving the quantized
-0.6 -0.4 -0.2 0 0.2 0.4 0.6
codeword, an offset NN at the BS refines the quantized codeword and then -0.6 -0.4
in [92] shows that the nearby users can cooperatively feedback sends the refined codeword to the decoder.
the shared path information after an end-to-end NN training.
Similar to [89], the distributed DeepCMC in [93] does
not separately feedback the CSI magnitude and phase when is replaced by a differentiable CR=16
Sigmoid-based approximate
exploiting the correlation among the nearby users. The encoder rounding function in [100].
part of the distributed DeepCMC is the same as that of
10
Based on [41], [106], the values of most codeword elements
10
CoCsiNet. However, the input of the encoder is the complex are almost near-zero, as shown in Fig. 23. The uniform
CSI instead of the CSI magnitude. The BS concatenates the 5 provides unnecessary quantization performance 5
quantization
feedback codewords of two users and sends them into a joint for the high values that seldom appear in practical signals;
feature decoder to reconstruct their CSI. The summation- thus, it is unsuitable for the quantization of CSI codeword.
based fusion branches in the distributed DeepCMC exploit the
0
A quantizer, with smaller step sizes at lower amplitudes and
0
property of channel gains, which consist of the summation of
-0.6 -0.4 -0.2 0 0.2 0.4 0.6
larger step sizes at high amplitudes, is needed. A non-uniform
-0.6 -0.4
multipath signal components. quantization, that is, 𝜇-law quantizer [107], is introduced in
[41], [106] to the CSI codeword quantization to meet the
above requirement. Fig. 24 shows the bit-level CSI feedback
framework proposed by [41], [106].图The 压缩 CSI CSI 值的分布图
C. Bitstream Generation 3.9downlink is
In practical systems, the CSI is fed back in the form first compressed by an encoder and then quantized by a
of bitstreams. If a 32-bit floating point codeword, that is, 𝜇-law non-uniform quantizer to generate a bitstream. Upon
the encoder’s output, is directly fed back, the overhead is receiving the quantized codeword, the BS first refines the
very large. Therefore, the codeword needs to be discretized 来评价,其表达式为 codeword by an offset NN with residual learning to reduce the
before feedback. Quantization error is considered a part of effect of the quantization errors. Then, the refined codeword
the errors introduced in the feedback process in [104]. The is sent to the decoder for CSI reconstruction.
∥s − sq ∥22
SQNR = E{
A two-stage },
simulation results show that the feedback errors heavily affect training stage is used because the gradient cannot be passed ∥s∥22
the feedback accuracy. during backpropagation. In the first stage, quantization is not
Reference [105] introduces that a uniform module is added considered, and the encoder and decoder are jointly trained
after the encoder in [105] and the number of quantization 其中 bits,s 为码字,s 为量化码字。
with collected qCSI samples. In the second stage, the codeword
𝐵, is set as 4. Quantization can be written as 8
is discretized by the 𝜇-law quantizer. The offset NN is trained
to minimize the errors introduced by quantization. Then,
(28) 表 3.1 不同场景和不同量化比特下均匀量化和非均匀量
round(2 𝐵−1 × s)
s𝑞 = . the decoder is finetuned with the refined codeword and the
2 𝐵−1
original high-quality CSI matrices. Simulation shows that the
This quantization method is easy to implement. However, the
non-uniform quantization with an offset NN outperforms the
rounding operation is non-differentiable, making the quan-场景 室内
uniform quantization by a large margin.
tization operation unable to be directly embedded into the
Entropy coding is introduced in [63] to CSI feedback to
end-to-end training based on gradient descent. Therefore, the
reduce feedback overhead further. In [63], the codeword is
gradient of the rounding operation is set as 1 during training CR
in [105]; thus, the autoencoder can be end-to-end trained 量化方法 first quantized by a uniform quantizer. Then, 4 context-based8 16
adaptive binary B arithmetic coding [108] converts the quantized
with the uniform quantization module. The rounding operation
values into a bitstream. This entropy coding is dependent
8 We assume that the activation function of the last layer at the encoder is on the input probability3model learned from 5.28the codeword.
5.67 5.9
Tanh, i.e., s ∈ (−1, 1) As mentioned before, the quantization is non-differentiable
4 11.37 11.75 11.
均匀量化
5 17.39 17.77 17.
17

Training Inference The effect of binarization (1-bit quantization) on feed-


back performance is also considered in [111]. Knowledge
distillation enables a teacher NN to distill and transfer dark
Quantization knowledge to a simple student NN [112] for bit-level CSI
feedback. In Fig. 26, an extra highly complex NN is introduced
during the NN training. The highly complex NN is the same
as the original NN except that it considers no binarization.
Entropy
Coding
Therefore, the gradient can be passed during the training of the
highly complex NN. To utilize the highly complex NN during
Uniform
Noise
the training, the joint training method in [111] alternatively
trains the original and the highly complex NNs. The highly
Entropy
Decoding
complex NN can guide the original NN training and prevent
the original NN to fall into a poor local minimum because the
gradient of the highly complex NN is lossless. This process
can be regarded as that the highly complex NN transfers its
De-quantization knowledge to the original NN.

D. Joint Design with Other Modules


Fig. 25. Illustration of entropy bottlenecks during NN training and inference The above works assume that the user directly feeds back
[109]. The uniform noise is added to the codeword during the NN training. perfect downlink CSI to the BS. However, the CSI is estimated
During NN inference, the codeword is uniformly quantized and then entropy
coded by the stored distribution P(𝑌˜ ). Finally, the generated bitstream is fed
from the downlink pilot signals, and the channel estimation
back. The codeword is reconstructed by entropy decoding and dequantization inevitably introduces errors to the downlink CSI. The quality
operations. of the recovered CSI at the BS becomes poor if the esti-
mated CSI is fed back by the NNs trained with perfect CSI
Original NN samples [121]. Therefore, the entire CSI acquisition needs
to be considered during the design of CSI feedback. Fig.
Decoder

27 shows the workflow of the downlink CSI acquisition and


utilization in FDD massive MIMO systems. First, the BS
designs the downlink pilots and transmits pilot signals to the
Encoder

users. Second, the user estimates the CSI from the received
Binarization pilot signals. Third, the user compresses and quantizes the
estimated CSI and transmits the bitstream to the BS. The BS
Decoder

reconstructs the downlink CSI from the feedback information.


Finally, the BS designs the BF/precoding matrix based on the
downlink CSI. In this part, some existing works on jointly
designing the CSI feedback with other modules to maximize
Auxiliary NN the performance gains, as shown in Table IV, are introduced.
1) Joint Channel Acquisition:
Fig. 26. Illustration of knowledge distillation-aided bit-level CSI feedback
training framework, which consists of the original and highly complex NNs a) Joint channel estimation and feedback: Two joint
[111] channel estimation and feedback frameworks are introduced in
[113]. The first framework, namely, PFnet, regards the channel
estimation and feedback as one module and directly com-
and cannot be directly embedded into end-to-end learning. presses the received pilot signals with an NN-based encoder.
Therefore, the quantization in this work is approximated as The decoder reconstructs the downlink CSI from the feed-
a random noise during the training instead of setting the back bitstream. This framework regards the CSI acquisition
gradient as one. The loss function in [63] consists of two parts: problem as an end-to-end black box. The second framework,
reconstruction error and entropy of feedback codeword. namely CEFnet, combines the communication knowledge with
Quantization and entropy coding are also used in the DL- NNs. The coarse downlink CSI is estimated by some simple
based CSI feedback in [109] by an entropy bottleneck layer
[110], which is composed of the quantizer, entropy model and
coder, and dequantizer. The encoder and decoder in [109] are  BF/Precoding Design
Pilot Channel Channel (BS)
based on the CRNet [43]. During the NN training, a random Transmission Estimation Feedback  Others ...
(BS) (UE) (UE&BS)
uniform noise is added to the output of the encoder, which is
similar to the operation in [63]. Fig. 25 depicts the entropy CSI Acquisition CSI Utilization
bottlenecks during NN training and inference. The numerical
results show that the method in [109] performs better than Fig. 27. Workflow of CSI acquisition and utilization in FDD massive MIMO
those in [41], [106] for a wide range of bit rates. systems
18

TABLE IV
J OINT D ESIGN WITH OTHER M ODULES

Joint design types Functions Main contributions in joint design


PFnet [113]: directly compressing and feedbacking the received pilot signals with an encoder;
Joint channel
CEFnet [113]: refining the estimated coarse CSI and then feedbacking it via autoencoder;
estimation and feedback
AnciNet [114]: introducing an two-stage training strategy when estimation and feedback are considered;
Joint Channel Acquisition CAnet-J [115]: the BS first designs pilots based on uplink CSI magnitude. Upon receiving pilot signals,
Joint pilot design, the user compresses the received signals using an encoder. Then, the decoder reconstructs downlink
channel estimation, CSI by utilizing the information from feedback bitstreams and the uplink CSI magnitude;
and feedback HyperRNN [90]: different from CAnet-J, the pilot is denoted by the weights of an FC layer;
CsiFBnet, [116]: the decoder NNs directly produce a BF vector that maximizes the BF gain;
[117]:jointly designing the feedback and hybrid precoding and pointing out that the gain achieved
Joint Channel Feedback Joint channel feedback
by joint design is large, especially when the feedback is extremely limited;
and Utilization and BF design
CsiCPreNet [118]: the BSs exchange the feedback codewords with one another when multiple-cell
is considered, and the precoding matrix is generated by a coordinated precoder design NN;
[119]: proposing an end-to-end limited feedback framework, including channel estimation,
Joint pilot design,
Joint Channel Acquisition feedback codebook design, and BF vector selection, for a single-user scenario;
channel estimation, feedback,
and Utilization [111]: extending the work in [119] to a multiuser scenario;
and precoding design
[120]: also including the pilot design module compared with [119];

algorithms, such as least-square estimation. Then, an extra


estimation subnet is employed to refine the coarse CSI. Finally,
the refined CSI is fed back to the BS by an autoencoder. A two- Pilot
stage training strategy is used to train the CEFnet. During the signal
first training stage, the estimation subnet is trained with coarse y
CSI as input data and perfect CS as target output. During the ul
|h | 1a Pilot design
second stage, the feedback subnet is trained with the output
of the estimation subnet as input data and the ideal CSI as Encoder
ground truth. The CEFnet remarkably outperforms the black- 2
box PFnet. The CEFnet framework can also be trained in an Compression
end-to-end manner. Based on the comparison of the one-stage s
end-to-end training with the two-stage one in [114], the two-
stage training can bring considerable performance gains. ĥ dla Decoder Bitstream
Quantization
b) Joint pilot design, channel estimation, and feedback: Feedback
Pilot length is limited due to the limited downlink training 3
resource. A joint pilot design and channel estimation strategy User
Reconstruction
is developed in [115] to reduce pilot overhead and channel BS
estimation errors. Unlike the methods in [122], [123] that
denote the downlink pilot by the weights of an FC layer, Fig. 28. Flowchart of DL-based uplink-aided joint CSI acquisition [115]
the pilot matrix in [115] is produced based on the uplink
CSI magnitude because of the correlation between bidirec-
tional channel magnitudes. Then, an uplink-aided entire CSI knowledge to the decoder at the BS. In HyperRNN, the pilot is
acquisition framework, CAnet-J, is shown in Fig. 28. The denoted by the weights of an FC layer, which is similar to that
BS first designs pilots based on the uplink CSI magnitude in [122], [123]. As mentioned in Section IV-B2, HyperRNN
in the angular domain. Upon receiving pilot signals, the introduces the uplink CSI to the downlink CSI reconstruction
user compresses and quantizes the received signals using by a hypernetwork. The hypernetwork generates the weights of
an NN-based encoder without channel estimation. Then, the the NN used in downlink CSI reconstruction based on the input
decoder reconstructs downlink CSI by utilizing the information uplink CSI. Moreover, reference [90] also considers the effect
from feedback bitstreams and the uplink CSI magnitude. of the imperfect uplink CSI on the downlink CSI acquisition
The numerical results show that the joint design outperforms and jointly designs the acquisition of uplink CSI.
the method that separately estimates and feeds back CSI 2) Joint Channel Feedback and Utilization: As indicated
[113], [114]. If channel estimation and feedback are separately in [116], the existing feedback methods only focus on ob-
implemented, the pilot signals received by the user need to taining as accurate downlink CSI as possible and ignore the
contain all information of the downlink CSI. By contrast, the physical meaning of downlink CSI. Therefore, the existing
signals do not need to contain the information shared with works reduce the CSI dimension by dropping the redundant
the uplink CSI if pilot signals are directly fed back and CSI information with a small effect on the reconstruction accu-
is reconstructed with the uplink CSI at the BS. Therefore, racy, that is, MSE/NMSE. However, signal fidelity cannot be
CAnet-J can perform better specifically when the pilot length exactly measured by MSE/NMSE [124]. Sometimes, the per-
is limited. formance of communication systems may be poor with a good
HyperRNN in [90] also considers the entire CSI acquisition. MSE/NMSE. Therefore, the effect of the CSI feedback on
The main difference between HyperRNN and CAnet-J is the next module, that is, the BF design, should be considered
the method of producing pilots and introducing uplink CSI jointly. The feedback framework for BF design in [116], called
19

CsiFBnet, maximizes the BF gain instead of the feedback Bit error

Dequantization
Quantization
performance. In single-cell systems, the user compresses the Compressed
Decoder
Reconstructed
CSI ECBlock CSI
CSI with an encoder and quantizes the compressed codeword.
Upon receiving the codeword, the decoder NNs produce a BF
vector. Considering the constant modulus constraint on the
analog BF vector, the output of the NNs at the BS is the Fig. 29. Illustration of ECBlock-aided digital CSI feedback framework [131],
in which quantization errors and feedback bit errors are considered. At the
phase of the BF vector. The NNs are end-to-end trained by BS, the codeword is refined by the ECBlock before being sent to the decoder.
an unsupervised approach. In multicell systems, the soft hand-
off model [125] is considered, and the user needs to feedback
the desired and the interfering CSI matrices to maximize the TYPE I feedback, with very low feedback overhead, is adopted
sum rate, which is a complicated joint optimization problem in 5G NR systems. For a more complicated multiuser case,
and turns into a local one by the approximation proposed in TYPE II feedback, with a much larger overhead, is preferred.
[126]. Simulation shows that joint design can greatly increase Therefore, DL-based CSI feedback needs to generate code-
the mean rate of massive MIMO systems, and the NNs show words under different lengths/CRs and accuracy. A straight-
high generalization to signal-to-noise ratio (SNR) and path forward way is to train several an NN model for each CR,
number of CSI. As mentioned in [117], the gain achieved by which occupies much space to store the NN parameters.
joint design is large, especially when the feedback is extremely The authors of [41] focus on reducing the NN parameter
limited. A more complicated scenario is considered in [118], number of the encoder and neglect that at the BS because
where the user receives the signals from all cells instead of of enough storage space at the BS. The FC layer in CsiNet+
only the nearby cells in [116]. The framework in [118] consists contains nearly all NN parameters. For example, when CR is
of two modules: a CSI compression NN and a coordinated 1/4, the NN parameters of the FC layer occupy 99.91% of
precoder NN. Upon receiving the feedback bitstreams, the BSs the entire encoder. Therefore, the FC layers are reused by all
exchange the feedback codewords with one another. Then, CRs. Two multirate frameworks, namely, serial multirate (SM-
the codewords are concatenated and sent to the coordinated CsiNet+) and parallel multirate (PM-CsiNet+) frameworks, are
precoder design NN to generate the precoding matrix. proposed. The codeword under a low CR (such as 1/64) in
3) Joint Channel Acquisition and Utilization: Reference SM-CsiNet+ is generated from that under a high CR (such
[119] proposes an end-to-end limited feedback framework, as 1/32). By contrast, PM-CsiNet+ generates the codeword
including channel estimation, feedback codebook design, and under a high CR from that under a low CR. The modular
BF vector selection. Specifically, the NNs at the user compress adaptive multirate (MAMR) framework in [127] also considers
the pilot signals without channel estimation and discretize the the complexity of the decoder at the BS. The input size of
compressed vector by a binarization operation. Upon yielding the decoder is fixed once trained. However, the codewords
the binary feedback information, the NNs at the BS generate under different CRs have different sizes, which is solved in
a BF vector, which can maximize the channel gain. The NNs [127] by zero-padding. The NN parameter at the decoder
at the user and the BS are jointly trained in an unsupervised can be reduced by approximately 42.5%. The framework in
manner. The work in [119] is extended to a multiuser scenario [128], called FOCU, combines the PM-CsiNet+ architecture
in [111]. The encoders at different users are the same. The in [41] with the padding operation in [127] to realize multirate
feedback bitstreams of multiple users are concatenated at the reconstruction at the BS.
BS and sent to the NNs to generate the precoding matrix. In practical systems, CR needs to be determined automat-
The optimization goal is to maximize the sum-rate instead of ically. In [129], a CNN-based classification module is added
channel gain [119]. before feedback. The classification model selects the suitable
Parallel to [111], a joint CSI acquisition and precoding de- CR in accordance with the CSI. The key problem is how to
sign framework is also proposed in [120]. This work includes generate the labels for NN training. IPredefining an accuracy
the pilot design module. For the data-driven pilot design, the threshold, such as −10 dB in [144], is suggested, and the
pilot is denoted by the weights of an FC layer as [90], [122], lowest CR that meets the accuracy requirement is marked as
[123]. Moreover, a two-step training strategy is proposed to the desired CR, that is, the label of the corresponding CSI.
make the trained NNs generalizable to different feedback Then, a supervised end-to-end learning is employed for the
overheads. In the first step, the NNs at the UE and the BS classification NNs, in which the CSI is the input and the
are jointly trained by an end-to-end approach. Quantization desired CR is the output.
is neglected in this step. In the second step, the weights of 2) Imperfect Feedback Link: In some practical environ-
the user-side NNs are fixed. Different quantization steps are ments, the feedback link suffers from various interference
applied to the NN output at the user, that is, feedback vector. and non-linear effect, which disturb the feedback codeword.
For each quantization step, a specific NN is trained at the BS. A plug-and-play denoise NN is added in [130] before the
decoder, which is based on residual learning and similar to the
offset network in [106]. The denoise NN consists of several FC
E. Practical Consideration layers and is trained to reduce the codeword noises introduced
1) Multirate Feedback: Some practical communication sys- by imperfect uplink transmission. Compared with the NNs
tems need to adjust the number of feedback bits in accordance without consideration of imperfect feedback, the NNs with
with the scenarios. For example, for the single-user scenario, an extra denoise network show high robustness to the uplink
20

TABLE V
P RACTICAL CONSIDERATION

Practical Consideration NN Name or Method Main contributions in practical consideration


SM-CsiNet+ [41] The codeword under a low CR (such as 1/64) is generated from that under a high CR (such as 1/32);
PM-CsiNet+ [41] The codeword under a high CR is generated from that under a low CR;
Multirate Feedback MAMR [127] The codewords under different CRs have different sizes and are padded with zeros at the BS;
FOCU [128] The PM-CsiNet+ architecture is combained with the padding operation;
[129] A classification module, which selects the suitable CR, is added before feedback;
DNNet [130] A plug-and-play denoise NN is added before the decoder to reduce transmission errors;
Imperfect Feedback Link ECBlock [131] The error correcting NN embedded before the decoder is trained with the autoencoder;
AnalogDeepCMC [132] The downlink CSI is directly mapped to the input of the uplink channel;
NN weight pruning [133]: the FC layer of the CsiNet+ encoder is pruned;
NN weight [133]: the NN weights are quantized with 3-7 bits after pre-training;
NN Complexity
quantization/binarization [61], [134]: the NN weights are quantized with 1 bit;
Knowledge distillation [135]: the knowledge of the complex CsiNet+ is transferred to the simple CsiNet;
[136]: the NMSE gap between the NNs trained with 3,200 and 800 CSI samples is 3.1 dB;
Data collection
[137]: the feedback NNs are trained using the uplink CSI samples due to the same characteristics;
Data Collection [59], [138]: transfer learning and meta learning is introduced to accelerate online training at the BS;
and Online Training [139]: the feedback NN is trained at the user side, and FL is adopted;
Online training
[140]: a new encoder is trained at the user side for a specific area without changing the decoder,
and gossip learning applied to multiuser scenario;
ImCsiNet [141] and EVCsiNet [142] The precoding matrix is fed back by an autoencoder instead of the whole CSI;
Standardization
AI4C2 F [143] An NN module is added at the BS to refine the channel codeword obtained by codebook-based feedback;

SNR. For example, the NMSE gap is up to 10 dB when uplink


SNR is 5 dB. -8

Unlike [130], for the digital CSI feedback in [131], the CsiNet

codeword is first quantized by the method proposed by [145] -10


NMSE (dB)
4M 9M 16M 25M
and fed back in the form of the bitstream. Bit errors are CLNet
CRNet Attention-CsiNet
inevitable due to imperfect transmission. Inspired by [130], -12
DS NLCsiNet
an error correction block (ECBlock) is deployed before the
DFECsiNet CsiNet+
decoder at the BS, as shown in Fig. 29. ECBlock consists -14 TransNet ACRNet 20x
of several FC layers, where residual learning is adopted. The
DCRNet
NN training is divided into two stages, including pretraining ACRNet 10x
-16
and alternate training. In the first training stage, the encoder 0 5 10 15 20 25 30 35 40 45 50
and decoder are first trained without feedback errors. Then, FLOPs (M)
ECBlock is trained using the codewords generated by the Fig. 30. NMSE (dB) versus FLOP number of entire NNs when CR is 1/16
-8
encoder, which has been well trained. In the second stage, the forCsiNetthe indoor scenarios.
-10 4M 9M 16M 25M
NMSE (dB)

entire model, including the encoder, quantization, adding bit CLNet


CRNet Attention-CsiNet
-12
DS NLCsiNet
errors, dequantization, ECBlock, and decoder, are connected DFECsiNet
FC layers before pruning
CsiNet+ FC layers after pruning
-14 TransNet ACRNet 20x
together and trained by an end-to-end approach. Considering DCRNet ACRNet 10x
-16
Pruning
the operations of quantization and adding bit errors are non- 0 5 10 15 20 25
FLOPs (M)
30 35 40 45 50
synapses
differentiable, their gradient is set as 1 [145].
A CNN-based analog CSI feedback is adopted in [132]. It Pruning
neurons
directly maps the downlink CSI to the input of the uplink
channel and can be regarded as a joint source channel coding
framework. This framework improves the robustness to the
imperfect uplink transmission and simplifies the feedback Fig. 31. Illustration of NN weight pruning in FC layers. NN complexity
process because of the joint source channel coding. Moreover, can be greatly reduced if redundant connections (synapses) and neurons are
the end-to-end joint source channel coding framework for CSI dropped.
feedback in [146] enhances NN robustness to the imperfect
uplink transmission. Concretely, the uplink transmission SNR
is input to the NNs, thereby making the trained NNs adaptive to practical systems. Therefore, the NN weight pruning and
to the uplink channel condition. quantization/binarization have been introduced to reduce the
3) NN Complexity: DL-based CSI feedback can improve complexity of DL-based feedback.
feedback accuracy and reduce feedback overheads. Fig. 30 a) NN weight pruning: As mentioned in Section IV-E1,
shows NMSE performance versus FLOP number of the entire the FC layer at the encoder occupies almost the entire weights
NNs when CR is 1/16 for the indoor scenario. The FLOP of the encoder. In FC layers, most connections (synapses) and
number of TransNet [56] is approximately nine times that neurons are redundant. The weight number can be greatly
of CsiNet, and NMSE is reduced by 6.35 dB. Therefore, decreased if redundant connections and neurons are dropped,
performance improvement is at the expense of NN com- as shown in Fig. 31. The basic idea of NN pruning is to remove
plexity. However, the high requirement of DL-based algo- the NN weights with small absolute values.
rithms in memory and computational complexity poses a In [133], CsiNet+ in [41] is used as an example to prune the
major challenge to the deployment of DL-based feedback FC layer at the encoder. NN weight pruning has two kinds:
21

pruning during training and pruning after pre-training. The


second pruning method is adopted by [133]. A binary mask,
with the same shape as the FC weights, is added to the FC
layer. The mask elements, with corresponding absolute values
below a predefined threshold, are set as zero. Then, the entire
NNs with the mask are finetuned with a small learning rate.
The gradient flows through the fixed mask and the NN weights,
with zero mask values, are not updated during the finetuning.
Simulation shows that NN pruning can greatly reduce the
weight number of FC layers with small effect on CSI feedback
accuracy. When CR is 1/16 for the indoor scenario, accuracy
drop is only 0.24 dB if 97.21% NN weights are pruned. This
NN compression can be easily extended to other works. For
example, the NNs for uplink-aided joint CSI acquisition in Fig. 32. Demonstration of dilated convolution in [51] when dilated rate is
[115] are pruned similarly. set as two. The receptive field is 5 × 5. However, the NN complexity of this
operation is the same as that of a standard convolution operation with 3 × 3
b) NN weight quantization/binarization: In most DL filters.
libraries, such as TensorFlow and PyTorch, the NN weights
are set as 32-bit floating point, resulting in a waste in memory Scenario1-CSI Decoder1
Feedback
space and an increase in computational complexity. The com- Scenario2-CSI Decoder2

Encoder H
putational power at the user is limited and cannot support high- Scenario3-CSI Decoder3 Recovered
CSI
precision computation. NN weight quantization is introduced Scenario4-CSI Index
Decoder4
in [133] to DL-based CSI feedback, where high-precision NN Softmax

GateNet
weights (such as 32-bit float point) are replaced with low-
precision ones (such as 1-bit float point). In [133], the NN
Fig. 33. Multitask learning-based CSI feedback framework for multiple
weights are quantized with 3-7 bits after pre-training. When scenarios in [160]
the quantization bit of NN weights is set as 6 or 7 for the
outdoor scenario, the performance gap between the original
and the quantized NNs is small. Furthermore, BCsiNet in d) Knowledge distillation: In knowledge distillation, the
[61] and ACRNet in [134] quantize the NN weights with knowledge learned by the complex teacher NN can be trans-
1 bit, thereby offering over 30 times memory saving and ferred to the simple student NN to improve the performance
approximate two times acceleration in inference speed with of the student NN. In [135], the knowledge of the complex
small effect on feedback performance. CsiNet+ is transferred to the simple CsiNet. The performance
of the CsiNet trained with knowledge distillation is greatly
c) Efficient NN architecture design: Early works, such
improved. For example, performance improvement is up to
as ConvCsiNet [40], improve NN performance by stacking
7.5 dB when CR is 1/4 for the indoor scenario. Knowledge
the vanilla convolutional layers. A minute improvement some-
distillation can be regarded as an NN training trick, which
times is at the expense of a substantial increase in NN
is plug-and-play and can be easily extended to the existing
complexity. Therefore, the NN architecture should be carefully
works.
designed and the efficient NN architecture should be adopted
instead of the redundant one. In [133] and [147], the vanilla 4) Data Collection and Online Training: Most existing
convolutional layers in ConvCsiNet are replaced with more works are conducted on simulation, in which the CSI samples
efficient convolution blocks, namely, the squeeze layer [148] can be easily obtained by channel generation software, such
and the shuffle layer [149], which can achieve a comparable as COST 2100 9 and QuaDRiGa10 . When deploying DL-
feedback performance when FLOP numbers are 1/3 and 1/4 based feedback to practical systems, data collection and online
of ConvCsiNet. training should be considered.
a) Data collection: NN performance depends on the
As indicated in [51], the 3 × 3 convolution operation cannot number of CSI samples used during NN training. For example,
offer enough receptive field, making the neurons in the deep the NMSE gap between the NNs trained with 3,200 and 800
layer unable to represent enough regions of the input CSI CSI samples is 3.1 dB in [136]. A straightforward way for
“images.” Thus, dilated convolutions [150] are introduced in data collection is that the user sends the stored high-quality
[51] to enhance the receptive field without a large increase in CSI samples to the BS as many as possible during idle time.
NN complexity. In Fig. 32, dilated convolutions inject holes However, this method contains two major problems. First,
into the vanilla convolution kernel. The dilated rate represents the user needs to store many CSI samples, occupying the
the interval number in the convolution kernel. If the dilated storage space at the user. Second, the uplink transmission of
rate is set as 1, the dilated convolution is the same as the CSI samples occupies precious uplink transmission resources.
standard convolution. The dilated rate in Fig. 32 is set as 2, Therefore, it is difficult to deploy in practical systems.
and the receptive field is 5 × 5. However, the NN complexity
of this operation is the same as that of a standard convolution 9 https://github.com/cost2100/cost2100

operation with 3 × 3 filters. 10 https://quadriga-channel-model.de/


22

The NNs of downlink CSI feedback in [137] are trained The said autoencoder-based feedback framework needs to
using the uplink CSI samples with the same characteris- change the existing CSI feedback schemes completely, thereby
tics/statistics. Although bidirectional transmissions are oper- making it difficult to be deployed in the next few years. Hence,
ated over different frequency bands, they share the same prop- developing a DL-based feedback framework, which does not
agation environment, which determines the CSI distributions. change the existing codebook-based feedback strategy, is
The numerical results show that the feedback accuracy of the essential. The DL-aided codebook enhancement strategy in
NNs trained by uplink CSI samples is close to that trained by [143] meets the above requirement. An NN module is added
downlink CSI when the duplex distance is 200 MHz. at the BS to refine the channel codeword obtained by the
b) Online training: The propagation environment is usu- codebook-based feedback. The performance of the original
ally stable for a long time. However, once the environment RVQ codebook is greatly improved with the aid of the NN-
greatly changes, the CSI distribution also changes. NNs trained based enhancement module.
with the previous distribution cannot work well in a new
environment. Therefore, online training is essential. Inspired F. Other Related Works
by [151] that applies transfer learning to DL-based CSI pre- In [154], DL is introduced to superimposed coding (SC)-
diction, several novel online training strategies are introduced based CSI feedback [155], where the user spreads and su-
to DL-based CSI feedback in [59] and [138] to accelerate perimposes downlink CSI on the uplink user data sequence.
the training convergence. Once the environment changes, Then, the BS recovers the downlink CSI and user data from the
pretrained NNs are fine-tuned using new CSI samples, which received signal with DL-based algorithms. Moreover, 1-bit CS
is regarded as transfer learning. Then, the model-agnostic meta and the partial bidirectional channel reciprocity are introduced
learning algorithm, which trains NNs by alternating inner-task by [156].
and across task updates and then adjusts the original NNs Feedback safety is considered in [157]. A bias layer is
for a new environment with few CSI samples, is adopted to added after the encoder to simulate the attack noise on the air
accelerate the NN training further. The training in [59], [138] is interface. The bias layer is trained in an end-to-end manner to
implemented at the BS. In [139], the feedback NN is trained at maximize feedback errors and minimize attack noise power
the user side. However, each user only stores some local CSI jointly. Simulation shows that the destructive effect of the
samples, which are not enough to train an NN for a whole adversarial attack is much higher than that of a jamming attack,
cell. Thus, a distributed learning framework, that is, federated which highlights the necessity to design an anti-attack method
learning (FL) [152], is introduced to the NN training. In FL- for DL-based algorithms.
aided online training, the user sends the NN gradient to the BS LB-SciFi proposed by [158] is a novel feedback framework
instead of CSI samples, thereby reducing the communication for multiuser MIMO in wireless local area networks. An
overhead. The BS, which can be regarded as an aggregation autoencoder is used to compress the CSI in 802.11 protocols to
server, aggregates the received NN and then transmits a global lower airtime overhead and improve system spectral efficiency.
NN model to each user. Experiments in a wireless testbed show that the DL-based
As mentioned before, the user occasionally stays in an method can offer a 73% reduction in airtime overhead and
area (e.g., an office) for a long time. In this scenario, the a 69% increase in system throughput compared with the
propagation environment is relatively stable. An online train- feedback protocol adopted by 802.11.
ing framework is proposed in [140] to utilize the above DL-based feedback is applied to RIS-assisted wireless sys-
observation, where a new encoder is trained at the user side tems by [159], where the user estimates the downlink CSI,
for a specific area without changing the decoder at the BS including the channels of the BS-user, BS-RIS, and RIS-user.
side. The NN training is employed at the user, and the CSI Given the substantial RIS element number, the dimension of
datasets do not need to be sent to the BS, thereby preventing the phase shift matrix is very high, making the overhead of
the occupation of uplink transmission resources. Moreover, feeding back the phase shift matrix unaffordable. Therefore,
the training framework is further extended to the multiuser phase shift is fed back by an autoencoder. The main difference
case. To utilize crowd intelligence, gossip learning [153] is between reference [159] and other works is that the matrix fed
applied to online learning, where the user exchanges the back to the BS is the RIS phase shift instead of the downlink
encoder weights with nearby users and then aggregates the CSI.
local encoder with the received one. Multitask learning is applied to the CSI feedback of multiple
5) Standardization: The current cellular systems, includ- scenarios in [160]. The CSI feedback in different scenarios is
ing 4G and 5G, are designed based on implicit feedback regarded as different tasks. In Fig. 33, the user compresses
mechanisms. However, all mentioned works focus on explicit the CSI using a fixed encoder in all scenarios. The GateNet
feedback, that is, full channel information feedback. The au- at the BS is an NN-based classifier and determines which
toencoder architecture is introduced in [141], [142] to feedback scenario the codeword comes from based on the distribution
CSI implicitly, in which the precoding matrix is fed back of the received codeword. Then, the corresponding decoder
instead of the whole CSI in CsiNet-like works [23]. The reconstructs the downlink CSI. The advantage of this method
simulation result in [141] shows that the DL-based implicit is the low complexity at the user because only an encoder is
CSI feedback can reduce at least 25% and 30% of feedback trained and stored for all scenarios.
overhead compared with TYPE I and TYPE II feedback A joint CSI compression and sensing framework is proposed
codebooks that are adopted by practical systems. by [161], in which CSI amplitude is compressed and recon-
23

structed by an autoencoder, and the sensing results are deter- the user with limited computational power needs a lightweight
mined by the received codeword instead of the reconstructed NN. For a certain user, the available computational power
CSI. The experiment results show that the classification accu- varies dynamically. Therefore, the feedback NNs need to
racy of sensing tasks is comparable with that of the method be executable at different widths (that is, neuron/channel
that sends back CSI amplitude without compression. number in an FC/convolutional layer) to permit a performance
complexity tradeoff during inference [164]. The NNs for the
V. F UTURE R ESEARCH D IRECTIONS user with limited computational power are part of the entire
NNs, and the BS transmits the partial NNs according to the
To accelerate the deployment of DL-based CSI feedback
user’s computational power.
in future communication systems, many challenges must be
tackled. Some of them are listed here.
C. Generalization
A. CSI Datasets from Realistic Systems NNs are trained with the CSI samples following a certain
distribution, which is determined by the propagation envi-
DL relies heavily on datasets. However, only simulated ronment. However, the environment cannot always be stable
datasets are available. Most existing works use the datasets [165]. The users do not always stay in a fixed cell and may
generated in [23], which adopts the COST 2100 channel model move to different cells. The environment of a cell inevitably
[162], to evaluate the performance of the proposed NNs. The changes over time. Therefore, how to generate an NN with
remaining works, such as [74], [92], [143], [163], generate the high generalization is one of the major challenges in DL-based
CSI samples by the QuaDRiGa software. The NNs proposed CSI feedback. Two potential methods can be used to tackle
for CSI feedback can obtain an excellent performance in some this challenge. The first method is to build an NN with high
datasets. However, whether the NNs can perform well in other generalization by carefully designing the training datasets to
datasets is not clear. Therefore, how to check the robustness cover the most channel distributions. A deep generation model
of the developed CSI feedback methods becomes critical. can be used to generate the CSI samples following a certain
No measured CSI samples except [145] are used to train distribution, such as in [166]. The second potential solution is
and evaluate DL-based CSI feedback. The simulated CSI online training, but it needs to collect plenty of CSI samples,
samples are generated by software, in which a certain channel leading to an extra transmission overhead. Therefore, the CSI
distribution is adopted. The predefined channel distribution samples need to be sent to the BS selectively using methods
cannot exactly describe the characteristics of realistic systems. such as the coreset selection algorithm in [167]. Moreover,
Although realistic datasets are introduced to DL-based CSI domain adaptation techniques can be applied to reduce the
feedback in [145], [158], the channel environments are very dataset requirement further and accelerate training.
simple, and the BS is equipped with several transmit antennas,
which is far from the practical systems. Therefore, DL-
based CSI feedback needs to be tested using realistic and D. Effect on Standardization
complicated channel datasets. DL-based CSI feedback is incorporated into the 3GPP R18
Moreover, collecting CSI datasets from practical systems is study item [22]. The effect of DL-based CSI feedback on the
difficult. The feedback NNs in [137], [163] are trained utilizing existing standard needs to be evaluated. First, how many sys-
the uplink CSI samples based on the distribution reciprocity, tem gains (instead of feedback accuracy, such as NMSE) can
which may not hold on in all systems. A dataset collection be achieved should be provided through the link- and system-
protocol should define how to select some appropriate users level simulation compared with the existing TYPE I and
to transmit CSI samples, when to send back the CSI samples, TYPE II codebook-based CSI feedback. Second, DL-based
and how to reduce the transmission overhead. algorithms are different from the conventional algorithms and
pose new requirements for the systems. Third, the evolution
B. Tradeoff between Performance and Complexity of the DL-based feedback framework needs to be discussed
further. The existing standard cannot be totally changed and
Fig. 30 shows that the accuracy is improved usually at can only be revised. For example, explicit feedback is fully
the expense of NN complexity. For example, the numbers different from the existing feedback framework and is difficult
of encoder FLOPs of CsiNet [23] and ConvCsiNet [40] are to be deployed in 5G-Advanced.
0.56 M and 58.52 M when CR is 1/16, respectively. In this
case, NMSE improvement is approximately 5 dB for the
indoor scenario. This complexity increase is not affordable E. High-speed Scenario
for the user with limited computational power. Although NN Mobility of users becomes higher in the future. Channel
compression techniques can greatly reduce the NN complex- aging is unavoidable and leads to a large drop in system per-
ity [133], NN complexity remains too high for users with formance. However, few DL-based feedback works consider
extremely limited computational power, such as Internet of the high-speed scenario. In this scenario, the decoder of the BS
Things sensors. Therefore, NN complexity should be further must not only be able to reconstruct the CSI accurately but also
reduced at the expense of performance, leading to a tradeoff predict the future CSI to reduce the influence of channel aging
between performance and complexity. The user with enough [168]. The DL-based feedback method should be designed by
computational power can be equipped with a powerful NN and considering the characteristics of the high-speed scenario. For
24

TABLE VI
T HE WORKS WITH OPEN SOURCE CODE IN DL- BASED CSI FEEDBACK .

Methods Links
CsiNet [23] https://github.com/sydney222/Python_CsiNet
CRNet [43] https://github.com/Kylin9511/CRNet
DS-NLCsiNet [45] https://github.com/yuxt1999/DS-NLCsiNet
CLNet [49] https://github.com/SIJIEJI/CLNet
DCRNet [51] https://github.com/recusant7/DCRNet
TransNet [56] https://github.com/Treedy2020/TransNet
SALDR [65] https://github.com/XS96/SALDR
P-SRNet [68] https://github.com/MoliaChen/SRNet
CsiNet-LSTM [85] https://www.ecsponline.com/goods.php?id=205629
ConvlstmCsiNet [97] https://github.com/Aries-LXY/ConvlstmCsiNet
DualNet [88] https://github.com/DLinWL/Bi-Directional-Channel-Reciprocity
[120] https://github.com/foadsohrabi/DL-DSC-FDD-Massive-MIMO
CHNet [128] https://github.com/ch28/CHNet
ACRNet [134] https://github.com/Kylin9511/ACRNet
PSCDN [159] https://github.com/xian-hua/PSCDN/

example, the user in this scenario usually moves on a fixed have been divided into six different categories, and each has
path, such as in rails, because the environment around the fixed been comprehensively introduced and discussed. Finally, the
path is usually long-term stable. new challenges and potential directions for future research
in DL-based CSI feedback, especially focusing on practical
F. Other Emerging Techniques deployment and standardization, have been elaborated.
Many new techniques, such as RIS [169] and extra-large
scale massive MIMO [170], are introduced to communications R EFERENCES
and regarded as potential key techniques in 6G. CSI feedback [1] X. Lin, J. Li, R. Baldemair et al., “5G new radio: Unveiling the
combined with these new techniques needs to be explored. essentials of the next generation wireless access technology,” IEEE
Commun. Standards Mag., vol. 3, no. 3, pp. 30–37, Sep. 2019.
For example, CSI acquisition (including feedback) is a ma- [2] M. Shafi, A. F. Molisch, P. J. Smith et al., “5G: A tutorial overview
jor challenge of the RIS-assisted communication systems, in of standards, trials, challenges, deployment, and practice,” IEEE J. Sel.
which the channel reciprocity may not hold on even in time- Areas Commun., vol. 35, no. 6, pp. 1201–1221, June 2017.
[3] A. Ghosh, A. Maeder, M. Baker, and D. Chandramouli, “5G evolution:
division duplexing systems [171]. The CSI dimension greatly A view on 5G cellular technology beyond 3GPP Release 15,” IEEE
increases because of the introduction of the RIS with a large Access, vol. 7, pp. 127 639–127 651, 2019.
element number. If the RIS has 100 × 100 elements and the [4] B. Bertenyi, “5G evolution: What’s next?” IEEE Wireless Commun.,
vol. 28, no. 1, pp. 4–8, Feb. 2021.
user is equipped with a single antenna, the channel between [5] X. Lin, “An overview of 5G Advanced evolution in 3GPP release
the RIS and the user is 10, 000 × 1, which is much larger 18,” arXiv preprint arXiv:2201.01358, 2022. [Online]. Available:
than that in the current massive MIMO systems. Therefore, a https://arxiv.org/abs/2201.01358
[6] J. Hoydis, F. A. Aoudia, A. Valcarce, and H. Viswanathan, “Toward a
more efficient DL framework needs to be explored to tackle 6G AI-native air interface,” IEEE Commun. Mag., vol. 59, no. 5, pp.
the challenges introduced by these new techniques. 76–81, May 2021.
[7] Z. Qin, H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep learning in physical
layer communications,” IEEE Wireless Commun., vol. 26, no. 2, pp.
G. Open Source Dataset and Code 93–99, Apr. 2019.
[8] T. Wang, C.-K. Wen, H. Wang, F. Gao, T. Jiang, and S. Jin, “Deep
Table VI shows the most DL-based CSI feedback works learning for wireless physical layer: Opportunities and challenges,”
with open source code. Reproducible research is essential in China Commun., vol. 14, no. 11, pp. 92–111, Nov. 2017.
[9] S. Liu, T. Wang, and S. Wang, “Toward intelligent wireless com-
DL-based algorithms. Open source can make the works more munications: Deep learning-based physical layer technologies,” Digit.
convincing and help accelerate research. Therefore, more open Commun. Netw., vol. 7, no. 4, pp. 589–597, Nov. 2021.
source works are welcome. Wireless-Intelligence is a public [10] A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, and D. Tujkovic, “Deep
learning coordinated beamforming for highly-mobile millimeter wave
channel dataset library, which has been built for DL-based systems,” IEEE Access, vol. 6, pp. 37 328–37 348, 2018.
wireless communications [172]. This library contains many [11] W. Xia, G. Zheng, Y. Zhu, J. Zhang, J. Wang, and A. P. Petropulu,
channel datasets that satisfy the 3GPP standard. However, “A deep learning framework for optimization of MISO downlink
beamforming,” IEEE Trans. Commun., vol. 68, no. 3, pp. 1866–1880,
channel datasets measured from the practical massive MIMO March 2020.
systems are not publicly available. An open practical channel [12] H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for channel
dataset is essential and urgent to accelerate the study of DL- estimation and signal detection in OFDM systems,” IEEE Wireless
Commun. Lett., vol. 7, no. 1, pp. 114–117, Feb. 2018.
based CSI feedback. [13] X. Gao, S. Jin, C.-K. Wen, and G. Y. Li, “ComNet: Combination
of deep learning and expert knowledge in OFDM receivers,” IEEE
VI. C ONCLUSION Commun. Lett., vol. 22, no. 12, pp. 2627–2630, Dec. 2018.
[14] P. Jiang, T. Wang, B. Han et al., “AI-aided online adaptive OFDM
In this paper, an overview of DL-based CSI feedback has receiver: Design and experimental results,” IEEE Trans. Wireless Com-
been provided. First, the basic DL concepts and representative mun., vol. 20, no. 11, pp. 7655–7668, Nov. 2021.
[15] T. O’shea and J. Hoydis, “An introduction to deep learning for the
NN architectures widely used in DL-based feedback have been physical layer,” IEEE Trans. Cogn. Commun. Netw, vol. 3, no. 4, pp.
briefly introduced to guide beginners. Then, the existing works 563–575, Dec. 2017.
25

[16] H. Ye, L. Liang, G. Y. Li, and B.-H. Juang, “Deep learning-based [42] Q. Cai, C. Dong, and K. Niu, “Attention model for massive MIMO CSI
end-to-end wireless communication systems with conditional GANs as compression feedback and recovery,” in Proc. IEEE Wireless Commun.
unknown channels,” IEEE Trans. Wireless Commun., vol. 19, no. 5, Netw. Conf. (WCNC), 2019, pp. 1–5.
pp. 3133–3143, May 2020. [43] Z. Lu, J. Wang, and J. Song, “Multi-resolution CSI feedback with deep
[17] S. Dörner, S. Cammerer, J. Hoydis, and S. Ten Brink, “Deep learning learning in massive MIMO system,” in Proc. IEEE Int. Conf. Commun.
based communication over the air,” IEEE J. Sel. Topics Signal Process., (ICC), 2020, pp. 1–6.
vol. 12, no. 1, pp. 132–143, Feb. 2018. [44] Q. Li, A. Zhang, P. Liu, J. Li, and C. Li, “A novel CSI feedback ap-
[18] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, proach for massive mimo using LSTM-Attention CNN,” IEEE Access,
“An overview of massive MIMO: Benefits and challenges,” IEEE J. vol. 8, pp. 7295–7302, 2020.
Sel. Topics Signal Process., vol. 8, no. 5, pp. 742–758, Oct. 2014. [45] X. Yu, X. Li, H. Wu, and Y. Bai, “DS-NLCsiNet: Exploiting non-local
[19] X. Rao and V. K. N. Lau, “Distributed compressive CSIT estimation neural networks for massive MIMO CSI feedback,” IEEE Commun.
and feedback for FDD multi-user massive MIMO systems,” IEEE Lett., vol. 24, no. 12, pp. 2790–2794, Dec. 2020.
Trans. Signal Process., vol. 62, no. 12, pp. 3261–3271, June 2014. [46] B. Tolba, M. Elsabrouty, M. G. Abdu-Aguye, H. Gacanin, and H. M.
[20] D. J. Love, R. W. Heath, V. K. N. Lau, D. Gesbert, B. D. Rao, Kasem, “Massive MIMO CSI feedback based on generative adversarial
and M. Andrews, “An overview of limited feedback in wireless network,” IEEE Commun. Lett., vol. 24, no. 12, pp. 2805–2808, Dec.
communication systems,” IEEE J. Sel. Areas Commun., vol. 26, no. 8, 2020.
pp. 1341–1365, Oct. 2008. [47] M. Hussien, K. K. Nguyen, and M. Cheriet, “PRVNet: Variational
[21] Z. Qin, J. Fan, Y. Liu, Y. Gao, and G. Y. Li, “Sparse representation autoencoders for massive MIMO CSI feedback,” arXiv preprint
for wireless communications: A compressive sensing approach,” IEEE arXiv:2011.04178, 2020. [Online]. Available: https://arxiv.org/abs/
Signal Process. Mag., vol. 35, no. 3, pp. 40–58, May 2018. 2011.04178
[22] 3GPP RP-213599, “New SI: Study on artificial intelligence [48] M. Gao, T. Liao, and Y. Lu, “Fully connected feedforward neural
(AI)/Machine Learning (ML) for NR air interface,” Moderator networks based CSI feedback algorithm,” China Commun., vol. 18,
(Qualcomm), Tech. Rep., Dec. 2021, accessed on May 1, no. 1, pp. 43–48, Jan. 2021.
2022. [Online]. Available: https://www.3gpp.org/ftp/tsg_ran/TSG_ [49] S. Ji and M. Li, “CLNet: Complex input lightweight neural network
RAN/TSGR_94e/Docs/RP-213599.zip designed for massive MIMO CSI feedback,” IEEE Wireless Commun.
[23] C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO Lett., vol. 10, no. 10, pp. 2318–2322, Oct. 2021.
CSI feedback,” IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748– [50] Y. Sun, W. Xu, L. Liang, N. Wang, G. Y. Li, and X. You, “A lightweight
751, Oct. 2018. deep network for efficient CSI feedback in massive MIMO systems,”
[24] N. Jindal, “MIMO broadcast channels with finite-rate feedback,” IEEE IEEE Wireless Commun. Lett., vol. 10, no. 8, pp. 1840–1844, Aug.
Trans. Inf. Theory, vol. 52, no. 11, pp. 5045–5060, Nov. 2006. 2021.
[25] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding [51] S. Tang, J. Xia, L. Fan, X. Lei, W. Xu, and A. Nallanathan, “Dilated
algorithm for linear inverse problems,” SIAM. J. Imaging Science, convolution based CSI feedback compression for massive MIMO
vol. 2, no. 1, pp. 183–202, 2009. systems,” arXiv preprint arXiv:2106.04043, 2021. [Online]. Available:
[26] A. L. Maas, A. Y. Hannun, A. Y. Ng et al., “Rectifier nonlinearities https://arxiv.org/abs/2106.04043
improve neural network acoustic models,” in Proc. ICML, 2013, pp. [52] Y. Zhang, X. Zhang, and Y. Liu, “Deep learning based CSI compression
1–6. and quantization with high compression ratios in FDD massive MIMO
[27] Z. Wu, C. Shen, and A. Van Den Hengel, “Wider or deeper: Revisiting systems,” IEEE Wireless Commun. Lett., vol. 10, no. 10, pp. 2101–
the resnet model for visual recognition,” Pattern Recognit., vol. 90, pp. 2105, Oct. 2021.
119–133, June 2019. [53] Z. Hu, J. Guo, G. Liu, H. Zheng, and J. Xue, “MRFNet: A deep
[28] A. Araujo, W. Norris, and J. Sim, “Computing receptive fields of learning-based CSI feedback approach of massive MIMO systems,”
convolutional neural networks,” Distill, vol. 4, no. 11, p. e21, 2019. IEEE Commun. Lett., vol. 25, no. 10, pp. 3310–3314, Oct. 2021.
[29] X. Ding, X. Zhang, Y. Zhou, J. Han, G. Ding, and J. Sun, “Scaling [54] B. Cao, Y. Yang, P. Ran, D. He, and G. He, “ACCsiNet: Asymmetric
up your kernels to 31x31: Revisiting large kernel design in CNNs,” in convolution-based autoencoder framework for massive MIMO CSI
Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2022. feedback,” IEEE Commun. Lett., vol. 25, no. 12, pp. 3873–3877, Dec.
[30] C. Olah, “Understanding LSTM networks,” http://colah.github.io/posts/ 2021.
2015-08-Understanding-LSTMs/, 2015, accessed on May 1, 2022. [55] Y. Xu, M. Zhao, S. Zhang, and H. Jin, “DFECsiNet: Exploiting diverse
[31] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural channel features for massive MIMO CSI feedback,” in Proc. 13th
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. WCSP, 2021, pp. 1–5.
[32] R. C. Staudemeyer and E. R. Morris, “Understanding LSTM–a [56] Y. Cui, A. Guo, and C. Song, “TransNet: Full attention network for CSI
tutorial into long short-term memory recurrent neural networks,” feedback in FDD massive MIMO system,” IEEE Wireless Commun.
arXiv preprint arXiv:1909.09586, 2019. [Online]. Available: https: Lett., pp. 1–1, 2022.
//arxiv.org/abs/1909.09586 [57] X. Bi, S. Li, C. Yu, and Y. Zhang, “A novel approach using convolu-
[33] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” tional transformer for massive MIMO CSI feedback,” IEEE Wireless
arXiv preprint arXiv:1312.6114, 2013. [Online]. Available: https: Commun. Lett., 2022, Early access.
//arxiv.org/abs/1312.6114 [58] H. Li, B. Zhang, H. Chang, X. Liang, and X. Gu, “CVLNet: A
[34] I. Goodfellow, J. Pouget-Abadie, M. Mirza et al., “Generative adversar- complex-valued lightweight network for CSI feedback,” IEEE Wireless
ial nets,” in Proc. 28th Adv. Neural Inf. Process. Syst. (NIPS), vol. 27, Commun. Lett., 2022, Early access.
2014. [59] Y. Wang, J. Sun, J. Wang et al., “Multi-rate compression for downlink
[35] A. Vaswani, N. Shazeer, N. Parmar et al., “Attention is all you need,” CSI based on transfer learning in FDD massive MIMO systems,” in
in Proc. 31st Adv. Neural Inf. Process. Syst. (NIPS), vol. 30, 2017. Proc. IEEE 94th VTC-Fall, 2021, pp. 1–5.
[36] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by [60] P. Liang, J. Fan, W. Shen, Z. Qin, and G. Y. Li, “Deep learning
jointly learning to align and translate,” in Proc. ICLR, 2015. and compressive sensing-based CSI feedback in FDD massive MIMO
[37] K. Xu, J. Ba, R. Kiros et al., “Show, attend and tell: Neural image systems,” IEEE Trans. Veh. Technol., vol. 69, no. 8, pp. 9217–9222,
caption generation with visual attention,” in Proc. ICML, 2015, pp. Aug. 2020.
2048–2057. [61] Z. Lu, J. Wang, and J. Song, “Binary neural network aided CSI
[38] X. Yang, “An overview of the attention mechanisms in computer feedback in massive MIMO system,” IEEE Wireless Commun. Lett.,
vision,” J. Phys. Conf. Ser., vol. 1693, no. 1, p. 012173, Dec. 2020. vol. 10, no. 6, pp. 1305–1308, June 2021.
[39] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional [62] B. Cheng, J. Zhao, and Y. Hu, “Multi-scale and multi-channel networks
block attention module,” in Proc. ECCV, Sept. 2018, pp. 3–19. for CSI feedback in massive MIMO system,” J. Comput. Commun.,
[40] W.-T. Shih, “Study on massive MIMO CSI feedback based on vol. 9, no. 10, pp. 132–141, Oct. 2021.
deep learning (in Traditional Chinese),” Master’s thesis, National [63] Q. Yang, M. B. Mashhadi, and D. Gündüz, “Deep convolutional
Sun Yat-sen University, 2018, Accessed on May 1, 2022. [Online]. compression for massive MIMO CSI feedback,” in Proc. IEEE 29th
Available: https://hdl.handle.net/11296/pvuea3 Int. Workshop Mach. Learn. Signal Process. (MLSP), 2019, pp. 1–6.
[41] J. Guo, C.-K. Wen, S. Jin, and G. Y. Li, “Convolutional neural [64] G. Fan, J. Sun, G. Gui, H. Gacanin, B. Adebisi, and T. Ohtsuki, “Fully
network-based multiple-rate compressive sensing for massive MIMO convolutional neural network based CSI limited feedback for FDD
CSI feedback: Design, simulation, and analysis,” IEEE Trans. Wireless massive MIMO systems,” IEEE Trans. on Cogn. Commun. Netw., 2021,
Commun., vol. 19, no. 4, pp. 2827–2840, Apr. 2020. Early access.
26

[65] X. Song, J. Wang, J. Wang et al., “SALDR: Joint self-attention learning NZKPT&v=bxKIu4EzKygyTCR6jG9Js0YheR7qc2TAD5tx_
and dense refine for massive MIMO CSI feedback with multiple MBkZkNnopR6AiW0Fe4CRDB-NTth
compression ratio,” IEEE Wireless Commun. Lett., vol. 10, no. 9, pp. [90] Y. Liu and O. Simeone, “HyperRNN: Deep learning-aided downlink
1899–1903, Sept. 2021. CSI acquisition via partial channel reciprocity for FDD massive
[66] Y. Xu, M. Yuan, and M.-O. Pun, “Transformer empowered CSI MIMO,” in Proc. IEEE 22nd Int. Workshop Signal Process. Adv.
feedback for massive MIMO systems,” in 26th Wireless Opt. Commun. Wireless Commun. (SPAWC), 2021, pp. 31–35.
Conf. (WOCC), 2021, pp. 157–161. [91] J. Wang, G. Gui, T. Ohtsuki, B. Adebisi, H. Gacanin, and H. Sari,
[67] S. Jo, J. Lee, and J. So, “Deep learning-based massive multiple- “Compressive sampled CSI feedback method based on deep learning
input multiple-output channel state information feedback with data for FDD massive MIMO systems,” IEEE Trans. Commun., vol. 69,
normalisation using clipping,” Electron. Lett., vol. 57, no. 3, pp. 151– no. 9, pp. 5873–5885, Sept. 2021.
154, Feb. 2021. [92] J. Guo, X. Yang, C.-K. Wen, S. Jin, and G. Y. Li, “DL-
[68] X. Chen, C. Deng, B. Zhou, H. Zhang, G. Yang, and S. Ma, “High- based CSI feedback and cooperative recovery in massive MIMO,”
accuracy CSI feedback with super-resolution network for massive arXiv preprint arXiv:2003.03303, 2020. [Online]. Available: https:
MIMO systems,” IEEE Wireless Commun. Lett., vol. 11, no. 1, pp. //arxiv.org/abs/2003.03303
141–145, Jan. 2022. [93] M. B. Mashhadi, Q. Yang, and D. Gündüz, “Distributed deep convo-
[69] Y. Wang, X. Chen, H. Yin, and W. Wang, “Learnable sparse lutional compression for massive MIMO CSI feedback,” IEEE Trans.
transformation-based massive MIMO CSI recovery network,” IEEE Wireless Commun., vol. 20, no. 4, pp. 2621–2633, Apr. 2021.
Commun. Lett., vol. 24, no. 7, pp. 1468–1471, July 2020. [94] H. Yin and D. Gesbert, “A partial channel reciprocity-based codebook
[70] J. Guo, L. Wang, F. Li, and J. Xue, “CSI feedback with model-driven for wideband FDD massive MIMO,” IEEE Trans. Wireless Commun.,
deep learning of massive MIMO systems,” IEEE Commun. Lett., pp. 2022, Early access.
1–1, 2021, Early access. [95] M. Stojanovic, J. Proakis, and J. Catipovic, “Analysis of the impact of
[71] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image channel estimation errors on the performance of a decision-feedback
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), equalizer in fading multipath channels,” IEEE Trans. Commun., vol. 43,
2016, pp. 770–778. no. 2/3/4, pp. 877–886, Feb./March/April 1995.
[72] X. Ding, Y. Guo, G. Ding, and J. Han, “ACNet: Strengthening the [96] Z. Liu, M. del Rosario, and Z. Ding, “A Markovian model-driven deep
kernel skeletons for powerful cnn via asymmetric convolution blocks,” learning framework for massive MIMO CSI feedback,” IEEE Trans.
in Proc. IEEE/CVF Interface Conf. Comput. Vis. (ICCV), 2019, pp. Wireless Commun., vol. 21, no. 2, pp. 1214–1228, Feb. 2022.
1911–1920. [97] X. Li and H. Wu, “Spatio-temporal representation with deep neural
[73] Z. Zhang, Y. Zheng, C. Gan, and Q. Zhu, “Massive MIMO CSI recon- recurrent network in MIMO CSI feedback,” IEEE Wireless Commun.
struction using CNN-LSTM and attention mechanism,” IET Commun., Lett., vol. 9, no. 5, pp. 653–657, May 2020.
vol. 14, no. 18, pp. 3089–3094, 2020. [98] Z. Huang and M. Li, “Research on channel feedback algorithm in UAV
[74] D. J. Ji and D.-H. Cho, “ChannelAttention: Utilizing attention layers for inspection communication subsystem of smart grid,” in Proc. 2nd IEEE
accurate massive MIMO channel feedback,” IEEE Wireless Commun. ISCEIC, 2021, pp. 236–240.
Lett., vol. 10, no. 5, pp. 1079–1082, May 2021.
[99] Y.-C. Lin, Z. Liu, T.-S. Lee, and Z. Ding, “Deep learning phase
[75] H. Zhao, J. Jia, and V. Koltun, “Exploring self-attention for image
compression for MIMO CSI feedback by exploiting FDD channel
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR),
reciprocity,” IEEE Wireless Commun. Lett., vol. 10, no. 10, pp. 2200–
2020, pp. 10 073–10 082.
2204, Oct. 2021.
[76] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, “Densely
[100] Z. Liu, L. Zhang, and Z. Ding, “An efficient deep learning framework
connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis.
for low rate massive MIMO CSI reporting,” IEEE Trans. Commun.,
Pattern Recog. (CVPR), 2017, pp. 2261–2269.
vol. 68, no. 8, pp. 4761–4772, Aug. 2020.
[77] M. Abadi, P. Barham, J. Chen et al., “TensorFlow: A system for large-
[101] D. Ha, A. Dai, and Q. V. Le, “Hypernetworks,” arXiv preprint
scale machine learning,” in Proc. 12th OSDI, 2016, pp. 265–283.
arXiv:1609.09106, 2016. [Online]. Available: https://arxiv.org/abs/
[78] A. Paszke, S. Gross, F. Massa et al., “PyTorch: An imperative style,
1609.09106
high-performance deep learning library,” NeurIPS, vol. 32, 2019.
[79] J. Sola and J. Sevilla, “Importance of input data normalization for the [102] M. Latva-aho, K. Leppänen, F. Clazzer, and A. Munari, “Key drivers
application of neural networks to complex industrial problems,” IEEE and research challenges for 6G ubiquitous wireless intelligence,”
Trans.Nucl. Sci., vol. 44, no. 3, pp. 1464–1468, June 1997. White Paper, 2020. [Online]. Available: http://jultika.oulu.fi/Record/
[80] H. He, S. Jin, C.-K. Wen, F. Gao, G. Y. Li, and Z. Xu, “Model-driven isbn978-952-62-2354-4
deep learning for physical layer communications,” IEEE Wireless [103] X. Du and A. Sabharwal, “Massive MIMO channels with inter-user
Commun., vol. 26, no. 5, pp. 77–83, Oct. 2019. angle correlation: Open-access dataset, analysis and measurement-
[81] N. Shlezinger, J. Whang, Y. C. Eldar, and A. G. Dimakis, based validation,” IEEE Trans. Veh. Technol., vol. 71, no. 2, pp. 1602–
“Model-based deep learning,” arXiv preprint arXiv:2012.08405, 2020. 1616, Feb. 2022.
[Online]. Available: https://arxiv.org/abs/2012.08405 [104] Y. Jang, G. Kong, M. Jung, S. Choi, and I.-M. Kim, “Deep autoencoder
[82] K. Gregor and Y. LeCun, “Learning fast approximations of sparse based CSI feedback with feedback errors and feedback delay in FDD
coding,” in Proc. 27th ICLR, 2010, pp. 399–406. massive MIMO systems,” IEEE Wireless Commun. Lett., vol. 8, no. 3,
[83] J. Liu, X. Chen, Z. Wang, and W. Yin, “ALISTA: Analytic weights pp. 833–836, June 2019.
are as good as learned weights in LISTA,” in Proc. ICLR, 2019. [105] C. Lu, W. Xu, S. Jin, and K. Wang, “Bit-level optimized neural network
[Online]. Available: https://openreview.net/forum?id=B1lnzn0ctQ for multi-antenna channel quantization,” IEEE Wireless Commun. Lett.,
[84] D. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inf. The- vol. 9, no. 1, pp. 87–90, Jan. 2020.
ory, vol. 41, no. 3, pp. 613–627, May 1995. [106] T. Chen, J. Guo, S. Jin, C.-K. Wen, and G. Y. Li, “A novel quantization
[85] T. Wang, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based CSI method for deep learning-based massive MIMO CSI feedback,” in
feedback approach for time-varying massive MIMO channels,” IEEE Proc. IEEE Global Conf. Signal Inf. Process. (GlobalSIP), 2019, pp.
Wireless Commun. Lett., vol. 8, no. 2, pp. 416–419, Apr. 2019. 1–5.
[86] C. Lu, W. Xu, H. Shen, J. Zhu, and K. Wang, “MIMO channel [107] C. Recommendation, “Pulse code modulation (PCM) of voice frequen-
information feedback using deep recurrent network,” IEEE Commun. cies,” in ITU, 1988.
Lett., vol. 23, no. 1, pp. 188–191, Jan. 2019. [108] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based adaptive binary
[87] S. Hong, S. Jo, and J. So, “Machine learning-based adaptive CSI arithmetic coding in the H.264/AVC video compression standard,”
feedback interval,” ICT Express, 2021, Early access. IEEE Trans. Circuits and Syst. for Video Tech., vol. 13, no. 7, pp.
[88] Z. Liu, L. Zhang, and Z. Ding, “Exploiting bi-directional channel 620–636, July 2003.
reciprocity in deep learning for low rate massive MIMO CSI feedback,” [109] S. Ravula and S. Jain, “Deep autoencoder-based massive MIMO CSI
IEEE Wireless Commun. Lett., vol. 8, no. 3, pp. 889–892, June 2019. feedback with quantization and entropy coding,” in Proc. IEEE Global
[89] T. Wang, “Research on key technology of massive Commun. Conf. (GLOBECOM), 2021, pp. 1–6.
MIMO channel feedback for intelligent communications [110] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Vari-
(in Chinese),” Master’s thesis, Southeast University, ational image compression with a scale hyperprior,” in Proc. ICLR,
2019, Accessed on May 1, 2022. [Online]. Available: 2018.
https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CMFD& [111] K. Kong, W.-J. Song, and M. Min, “Knowledge distillation-aided end-
dbname=CMFD202001&filename=1020612377.nh&uniplatform= to-end learning for linear precoding in multiuser MIMO downlink
27

systems with finite-rate feedback,” IEEE Trans. Veh. Technol., vol. 70, feedback in massive MIMO system,” IEEE Trans. Wireless Commun.,
no. 10, pp. 11 095–11 100, Oct. 2021. 2022, Early access.
[112] G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a [135] H. Tang, J. Guo, M. Matthaiou, C.-K. Wen, and S. Jin, “Knowledge-
neural network,” in Proc. Adv. Neural Inf. Process. Syst. Workshops distillation-aided lightweight neural network for massive MIMO CSI
(NIPSW), 2015. feedback,” in Proc. IEEE 94th VTC-Fall, 2021, pp. 1–5.
[113] J. Guo, T. Chen, C.-K. Wen, S. Jin, G. Y. Li, X. Wang, and X. Hou, [136] H. Sun, Z. Zhao, X. Fu, and M. Hong, “Limited feedback double di-
“Deep learning for joint channel estimation and feedback in massive rectional massive MIMO channel estimation: From low-rank modeling
MIMO systems,” Digit. Commun. Netw., 2022, Language/Minor revi- to deep learning,” in Proc. IEEE 19th Int. Workshop Signal Process.
sion. Adv. Wireless Commun. (SPAWC), 2018, pp. 1–5.
[114] Y. Sun, W. Xu, L. Fan, G. Y. Li, and G. K. Karagiannidis, “AnciNet: An [137] N. Song and T. Yang, “Machine learning enhanced CSI acquisition
efficient deep learning approach for feedback compression of estimated and training strategy for FDD massive MIMO,” in Proc. IEEE Wireless
CSI in massive MIMO systems,” IEEE Wireless Commun. Lett., vol. 9, Commun. Netw. Conf. (WCNC), 2021, pp. 1–6.
no. 12, pp. 2192–2196, Dec. 2020. [138] J. Zeng, J. Sun, G. Gui et al., “Downlink CSI feedback algorithm with
[115] J. Guo, C.-K. Wen, and S. Jin, “CAnet: Uplink-aided downlink channel deep transfer learning for FDD massive MIMO systems,” IEEE Trans.
acquisition in FDD massive MIMO using deep learning,” IEEE Trans. Cogn. Commun. Netw., vol. 7, no. 4, pp. 1253–1265, Dec. 2021.
Commun., vol. 70, no. 1, pp. 199–214, Jan. 2022. [139] J. Jiang, R. Han, B. Liu, and D. Feng, “Federated learning-based
[116] ——, “Deep learning-based CSI feedback for beamforming in single- codebook design for massive MIMO communication system,” in Proc.
and multi-cell massive MIMO systems,” IEEE J. Sel. Areas Commun., International Conference on Natural Computation, Fuzzy Systems and
vol. 39, no. 7, pp. 1872–1884, July 2021. Knowledge Discovery. Springer, 2021, pp. 1198–1205.
[117] Q. Sun, H. Zhao, J. Wang, and W. Chen, “Deep learning-based joint [140] J. Guo, Y. Zuo, C.-K. Wen, and S. Jin, “User-centric online gossip
CSI feedback and hybrid precoding in FDD mmWave massive MIMO training for autoencoder-based CSI feedback,” IEEE J. Sel. Topics
systems,” Entropy, vol. 24, no. 4, p. 441, 2022. Signal Process., 2022, Early access.
[118] A.-A. Lee, Y.-S. Wang, and Y.-W. P. Hong, “Deep CSI compression [141] M. Chen, J. Guo, C.-K. Wen, S. Jin, G. Y. Li, and A. Yang, “Deep
and coordinated precoding for multicell downlink systems,” in Proc. learning-based implicit CSI feedback in massive MIMO,” IEEE Trans.
IEEE Global Commun. Conf. (GLOBECOM), 2020, pp. 1–6. Commun., vol. 70, no. 2, pp. 935–950, Feb. 2022.
[119] J. Jang, H. Lee, S. Hwang, H. Ren, and I. Lee, “Deep learning-based [142] W. Liu, W. Tian, H. Xiao, S. Jin, X. Liu, and J. Shen, “EVCsiNet:
limited feedback designs for MIMO systems,” IEEE Wireless Commun. Eigenvector-based CSI feedback under 3GPP link-level channels,”
Lett., vol. 9, no. 4, pp. 558–561, Apr. 2020. IEEE Wireless Commun. Lett., vol. 10, no. 12, pp. 2688–2692, Dec.
[120] F. Sohrabi, K. M. Attiah, and W. Yu, “Deep learning for distributed 2021.
channel feedback and multiuser precoding in FDD massive MIMO,” [143] J. Guo, C.-K. Wen, M. Chen, and S. Jin, “Environment knowledge-
IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4044–4057, July aided massive MIMO feedback codebook enhancement using artificial
2021. intelligence,” 2022, Early access.
[121] M. Boloursaz Mashhadi and D. Gündüz, “Deep learning for massive
[144] H. Xiao, Z. Wang, W. Tian et al., “AI enlightens wireless communi-
mimo channel state acquisition and feedback,” J. Indian Inst. Sci., vol.
cation: Analyses, solutions and opportunities on CSI feedback,” China
100, no. 2, pp. 369–382, 2020.
Commun., vol. 18, no. 11, pp. 104–116, Nov. 2021.
[122] X. Ma and Z. Gao, “Data-driven deep learning to design pilot and
[145] J. Guo, X. Li, M. Chen et al., “AI enabled wireless communications
channel estimator for massive MIMO,” IEEE Trans. Veh. Technol.,
with real channel measurements: Channel feedback,” J. Commun. Inf.
vol. 69, no. 5, pp. 5677–5682, March 2020.
Netw., vol. 5, no. 3, pp. 310–317, Sept. 2020.
[123] M. B. Mashhadi and D. Gündüz, “Pruning the pilots: Deep learning-
based pilot design and channel estimation for MIMO-OFDM systems,” [146] J. Xu, B. Ai, N. Wang, and W. Chen, “Deep joint source-
IEEE Trans. Wireless Commun., vol. 20, no. 10, pp. 6315–6328, Oct. channel coding for CSI feedback: An end-to-end approach,”
2021. arXiv preprint arXiv:2203.16005, 2022. [Online]. Available: https:
//arxiv.org/abs/2203.16005
[124] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it?
a new look at signal fidelity measures,” IEEE Signal Process. Mag., [147] Z. Cao, W.-T. Shih, J. Guo, C.-K. Wen, and S. Jin, “Lightweight
vol. 26, no. 1, pp. 98–117, Jan. 2009. convolutional neural networks for CSI feedback in massive MIMO,”
[125] O. Somekh, B. M. Zaidel, and S. Shamai, “Sum rate characterization IEEE Commun. Lett., vol. 25, no. 8, pp. 2624–2628, Aug. 2021.
of joint multiple cell-site processing,” IEEE Trans. Inf. Theory, vol. 53, [148] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally,
no. 12, pp. 4473–4497, Dec. 2007. and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer
[126] R. Bhagavatula and R. W. Heath, “Adaptive limited feedback for sum- parameters and <0.5 mb model size,” arXiv preprint arXiv:1602.07360,
rate maximizing beamforming in cooperative multicell systems,” IEEE 2016. [Online]. Available: https://arxiv.org/abs/1602.07360
Trans. Signal Process., vol. 59, no. 2, pp. 800–811, Feb. 2011. [149] X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An extremely
[127] Y. Wang, Y. Zhang, J. Sun, G. Gui, T. Ohtsuki, and F. Adachi, “A efficient convolutional neural network for mobile devices,” in Proc.
novel compression CSI feedback based on deep learning for FDD IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), 2018, pp. 6848–6856.
massive MIMO systems,” in Proc. IEEE Wireless Commun. Netw. Conf. [150] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
(WCNC), 2021, pp. 1–5. “DeepLab: Semantic image segmentation with deep convolutional nets,
[128] X. Liang, H. Chang, H. Li, X. Gu, and L. Zhang, “Changeable atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern
rate and novel quantization for CSI feedback based on deep Anal. Machine Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018.
learning,” arXiv preprint arXiv:2202.13627, 2022. [Online]. Available: [151] Y. Yang, F. Gao, Z. Zhong, B. Ai, and A. Alkhateeb, “Deep transfer
https://arxiv.org/abs/2202.13627 learning-based downlink channel prediction for FDD massive MIMO
[129] S. Jo and J. So, “Adaptive lightweight CNN-based CSI feedback systems,” IEEE Trans. Commun., vol. 68, no. 12, pp. 7485–7497, Dec.
for massive MIMO systems,” IEEE Wireless Commun. Lett., vol. 10, 2020.
no. 12, pp. 2776–2780, Dec. 2021. [152] M. B. Mashhadi, M. Jankowski, T.-Y. Tung, S. Kobus, and D. Gündüz,
[130] H. Ye, F. Gao, J. Qian, H. Wang, and G. Y. Li, “Deep learning-based “Federated mmwave beam selection utilizing LIDAR data,” IEEE
denoise network for CSI feedback in FDD massive MIMO systems,” Wireless Commun. Lett., vol. 10, no. 10, pp. 2269–2273, Oct. 2021.
IEEE Commun. Lett., vol. 24, no. 8, pp. 1742–1746, Aug. 2020. [153] L. Giaretta and Š. Girdzijauskas, “Gossip learning: Off the beaten path,”
[131] H. Chang, X. Liang, H. Li, J. Shen, X. Gu, and L. Zhang, “Deep in Proc. IEEE Int. Conf. Big Data, 2019, pp. 1117–1124.
learning-based bitstream error correction for CSI feedback,” IEEE [154] C. Qing, B. Cai, Q. Yang, J. Wang, and C. Huang, “Deep learning for
Wireless Commun. Lett., vol. 10, no. 12, pp. 2828–2832, Dec. 2021. CSI feedback based on superimposed coding,” IEEE Access, vol. 7,
[132] M. B. Mashhadi, Q. Yang, and D. Gündüz, “CNN-based analog CSI pp. 93 723–93 733, 2019.
feedback in FDD MIMO-OFDM systems,” in Proc. IEEE 45th Int. [155] D. Xu, Y. Huang, and L. Yang, “Feedback of downlink channel state
Conf. Acoust. Speech Signal Process. (ICASSP), 2020, pp. 8579–8583. information based on superimposed coding,” IEEE Commun. Lett.,
[133] J. Guo, J. Wang, C.-K. Wen, S. Jin, and G. Y. Li, “Compression and vol. 11, no. 3, pp. 240–242, March 2007.
acceleration of neural networks for communications,” IEEE Wireless [156] C. Qing, Q. Ye, W. Liu, and J. Wang, “Fusion learning for 1-
Commun., vol. 27, no. 4, pp. 110–117, Aug. 2020. bit CS-based superimposed CSI feedback with bi-directional channel
[134] Z. Lu, X. Zhang, H. He, J. Wang, and J. Song, “Binarized aggregated reciprocity,” IEEE Commun. Lett., vol. 26, no. 4, pp. 813–817, Apr.
network with quantization: Flexible deep learning deployment for CSI 2022.
28

[157] Q. Liu, J. Guo, C.-K. Wen, and S. Jin, “Adversarial attack on DL-based
massive MIMO CSI feedback,” J. Commun. Netw., vol. 22, no. 3, pp.
230–235, June 2020.
[158] P. K. Sangdeh, H. Pirayesh, A. Mobiny, and H. Zeng, “LB-SciFi:
Online learning-based channel feedback for MU-MIMO in wireless
LANs,” in Proc. IEEE 28th Int. Conf. Netw. Protocols (ICNP), 2020,
pp. 1–11.
[159] X. Yu, D. Li, Y. Xu, and Y.-C. Liang, “Convolutional autoencoder-
based phase shift feedback compression for intelligent reflecting
surface-assisted wireless systems,” IEEE Commun. Lett., vol. 26, no. 1,
pp. 89–93, Jan. 2022.
[160] X. Li, J. Guo, C.-K. Wen, S. Jin, and S. Han, “Multi-
task learning-based CSI feedback design in multiple scenarios,”
arXiv preprint arXiv:2204.12698, 2022. [Online]. Available: https:
//arxiv.org/abs/2204.12698
[161] J. Yang, X. Chen, H. Zou, D. Wang, Q. Xu, and L. Xie, “EfficientFi:
Towards large-scale lightweight WiFi sensing via CSI compression,”
IEEE Internet Things J., 2021, Early access.
[162] L. Liu, C. Oestges, J. Poutanen et al., “The COST 2100 MIMO channel
model,” IEEE Wireless Commun., vol. 19, no. 6, pp. 92–99, Dec. 2012.
[163] W. Utschick, V. Rizzello, M. Joham, Z. Ma, and L. Piazzi, “Learning
the CSI recovery in FDD systems,” IEEE Trans. Wireless Commun.,
2022, Early access.
[164] J. Yu, L. Yang, N. Xu, J. Yang, and T. Huang, “Slimmable
neural networks,” in Proc. ICLR, 2019. [Online]. Available:
https://openreview.net/forum?id=H1gMCsAqY7
[165] W. Tong and G. Y. Li, “Nine challenges in artificial intelligence and
wireless communications for 6G,” IEEE Wireless Commun., pp. 1–10,
2022, Early access.
[166] H. Xiao, W. Tian, W. Liu, and J. Shen, “ChannelGAN: Deep learning-
based channel modeling and generating,” IEEE Wireless Commun.
Lett., vol. 11, no. 3, pp. 650–654, March 2022.
[167] T. Campbell and T. Broderick, “Bayesian coreset construction via
greedy iterative geodesic ascent,” in Proc. ICML, 2018, pp. 698–706.
[168] Y. Yang, F. Gao, X. Ma, and S. Zhang, “Deep learning-based channel
estimation for doubly selective fading channels,” IEEE Access, vol. 7,
pp. 36 579–36 589, 2019.
[169] S. Basharat, S. A. Hassan, H. Pervaiz, A. Mahmood, Z. Ding, and
M. Gidlund, “Reconfigurable intelligent surfaces: Potentials, appli-
cations, and challenges for 6G wireless networks,” IEEE Wireless
Commun., vol. 28, no. 6, pp. 184–191, Dec. 2021.
[170] E. D. Carvalho, A. Ali, A. Amiri, M. Angjelichinoski, and R. W. Heath,
“Non-stationarities in extra-large-scale massive MIMO,” IEEE Wireless
Commun., vol. 27, no. 4, pp. 74–80, Aug. 2020.
[171] W. Tang, X. Chen, M. Z. Chen et al., “On channel reciprocity in
reconfigurable intelligent surface assisted wireless networks,” IEEE
Wireless Commun., vol. 28, no. 6, pp. 94–101, Dec. 2021.
[172] OPPO, “Wireless-Intelligence,” https://wireless-intelligence.com/, ac-
cessed on May 1, 2022.

You might also like