Deep Learning in Physical Layer Communication
Deep Learning in Physical Layer Communication
Deep Learning in
Physical Layer Communications
Zhijin Qin, Hao Ye, Geoffrey Ye Li, and Biing-Hwang Fred Juang
Zhijin Qin is with Queen Mary University of London; Digital Object Identifier:
Hao Ye, Geoffrey Ye Li, and Biing-Hwang Fred Juang are with Georgia Institute of Technology. 10.1109/MWC.2019.1800601
wn
xn
FIGURE 1. Development of neural networks: a) single neuron element; b) feedforward neural network (FNN); c) recurrent neural net-
work (RNN).
Deep Neural Networks and boosted many applications due to the powerful
algorithms and tools. DCN has shown its great
Deep Learning Based Communications potential for signal compression and recovery
In this section, we will first introduce the basis of problems, which will be demonstrated below.
DNN, generative adversarial network (GAN), con- For the RNN in Fig. 1c, the outputs of each
ditional GAN, and Bayesian optimal estimator, layer are determined by both the current inputs
which are widely used in DL-based communica- and their hidden states in the previous time step.
tion systems. Then we will discuss the intelligent The critical difference between FNN and RNN is
communication systems with DL. that the latter has memory and can capture the
hidden layer outputs in the previous step. How-
Deep Neural Networks ever, as RNN is dependent on time over a long
Deep Neural Networks Basis: As aforementioned, term, non-stationary errors may show up during
research on NN started from the single neuron. As the training process. A special type of RNN, named
shown in Fig. 1a, the inputs of the NN are {x1, x2, long short-term memory (LSTM), has been further
… , xn} with the corresponding weights, {w1, w2, … proposed to eliminate some unnecessary infor-
, wn}. The neuron can be represented by a non-lin- mation in the network. LSTM has been widely
ear activation function, s(•), that takes the sum of applied in various cases, such as the joint deign
the weighted inputs. The output of the neuron can of source-channel coding, which will be briefly dis-
n
be expressed as y = s(Si=1wixi + b), where b is the cussed later.
shift of the neuron. An NN can be established by Generative Adversarial Net (GAN) and Con-
connecting multiple neuron elements to generate ditional GAN: Training a typical DNN is heavily
multiple outputs to construct a layered architec- dependent on the large amount of labelled data,
ture. In the training process, the labelled data, that which may be difficult to obtain or even unavail-
is, a set of input and output vector pairs, is used able in certain circumstance. As shown in Fig. 2,
to adjust the weight set, W, by minimizing a loss GAN is a type of generative method, which can
function. In the NN with single neuron element, produce data that follows a certain target distri-
W = {b, w1, w2, … , wn}. The commonly-used loss bution. By doing so, demands for the amount of
functions include mean-squared error (MSE) and labelled data can be lowered. In Fig. 2, a GAN
categorical cross-entropy. To train the model for a consists of a generator, G, and a discriminator, D.
specific scenario, the loss function can be revised D attempts to differentiate between the real data
by introducing the l1-norm or l2-norm of W or acti- and the fake data generated by G while G tries
vations. The l1-norm or l2-norm of W can also be to generate plausible data to fool D into making
introduced in the loss function as the regularizer to mistakes, which introduces a min-max two player
improve the generalization capabilities. Stochastic game between G and D. As a result of the min-max
gradient descent (SGD) is one of the most popular two player game, the generator, G, will generate
algorithms to optimize W. data with the same distribution as the real data,
With the layered architecture, a DNN includes and therefore the discriminator, D, cannot identi-
multiple fully connected hidden layers, in which fy the difference between the real and fake data.
each of them represents a different feature of the Conditional GAN is an extension of GAN by pro-
input data. Figures 1b and 1c show two typical viding extra conditioning information, m, where the
DNN models: feedforward neural network (FNN) conditioning information has been fed to both G
and recurrent neural network (RNN). In FNNs, and D as the additional input.
each neuron is connected to the adjacent layers In communication systems, a GAN and a con-
while the neurons in the same layers are not con- ditional GAN can be applied to model the distribu-
nected to each other. The deep convolutional net- tion of the channel output. Moreover, the learned
work (DCN) is developed from the fully connected model can be utilized as a surrogate of the real
FNN by only keeping some of the connections channel when training the transmitter so that the
between neurons and their adjacent layers. As a gradients can pass through to the transmitter. An
result, DCN can significantly reduce the number application example of conditional GAN will be
of parameters to be trained [3]. Recently, DL has introduced later.
Noise layer
RefineNet
structured data can be Channel Recovered
matrix CSI
modelled by different
approaches. Sparse
representation is a Convolutional Dense Dense Convolutional
commonly-used one. It layer layer
is worth noting that the
most important prop- FIGURE 4. DL-based channel compression, feedback, and recovery by CsiNet [10].
erty of DL is that it
can automatically find
division duplex networks, massive MIMO relies while channel coding improves the robustness to
compact low-dimen- on channel state information (CSI) feedback to noise by adding redundancy to the coded informa-
sional representations/ achieve performance gains from multiple antennas tion when it is transmitted over channels. The Shan-
features of high dimen- at the base station. However, the large number of non separation theorem guarantees that source
sional data. antennas results in excessive feedback overhead. coding and channel coding can be designed sepa-
Extensive work has been carried out to reduce the rately without loss of optimality. However, in many
feedback overhead by utilizing the spatial and tem- communication systems, source coding and chan-
poral correlations of CSI. By exploiting the sparse nel coding are designed jointly as it is not practical
property of CSI, compressive sensing (CS) has to have very large blocks.
been applied to compress CSI at the user side and A joint source-channel coding based on DL has
the compressed CSI is then recovered at the base been proposed in [11]. With text as the source data,
station. However, traditional CS algorithms face the DL-based source-channel encoder and decoder
challenges as real-world data is not exactly sparse may output different sentences but preserve their
and the convergence speed of the existing signal semantic information content. Specifically, the pro-
recovery algorithms is relatively slow, which has posed model adopts an RNN encoder, a binariza-
limited the practical applications of CS [7]. tion layer, a channel layer, and an RNN decoder.
DCN has been applied to learn the inverse The text is structured before it is processed by the
transformation from measurement vectors to sig- stacked bidirectional LSTM networks. Then the bina-
nals to improve the recovery speed in CS [7]. In rizer is adopted to output binary values, which are
particular, DCN has two distinctive features that taken as the inputs of the channel. At the receiver, a
make it uniquely applicable to sparse recovery stack of LSTM is used for decoding. By doing so, the
problems. One is that the neurons are sparsely word-error rate is lowered compared with various
connected. The other is with shared weights across traditional separate source-channel coding baselines,
the entire receptive fields of one layer, DCN can such as using Huffman and Reed-Solomon code
increase the learning speed compared to a ful- for source and channel coding, respectively. Even
ly-connected network [8]. Learned denoising-based though this design is particularly for text processing,
AMP (LDAMP) [9] is one of the excellent signal it inspires us to apply DL to where recovery of the
recovery algorithms in terms of both accuracy and exact transmitted data is not compulsory as long
speed, which has been applied to channel estima- as the main information within it is conveyed. For
tion in millimeter-wave (mmWave) communica- example, in sparse support detection, we need to
tions [2]. However, the achieved improvement still determine if there is a sparse support at each loca-
cannot boost the CS-based CSI estimation. tion while the exact amplitude of each location is
CsiNet [10] has been proposed to mimic the not of interest.
CS processes for channel compression, feedback, In addition to the aforementioned two exam-
and reconstruction. In particular, an encoder is used ples, DL has also been widely applied in other sig-
to collect compressed measurements by directly nal compression applications. For example, instead
learning channel structures from the training data. of performing joint source-channel coding, DL can
As shown in Fig. 4, by taking the channel matrix in be applied to source coding and channel coding,
the angular-delay domain as inputs, the first layer of separately, to achieve better performance com-
the encoder is a convolutional layer that generates pared to typical coding techniques. Moreover,
two feature maps. Then the feature maps are vector- DNN has also been widely applied to facilitate the
ized and a fully connected layer is used to generate design of measurement matrix and signal recovery
the real-valued compressed CSI. Only those com- algorithm in CS [6], which can be used in various
pressed CSI is fed back to the base station. With wireless scenarios, that is, channel estimation and
such an encoder, the feedback overhead is signifi- wideband spectrum sensing.
cantly reduced. At the base station, the decoder
reconstructs the CSI via learning the inverse trans- Intelligent Signal Detection
formation of the compressed feedback. It has been DL-based detection algorithms can significantly
shown that CsiNet remarkably outperforms the tra- improve the performance of communication sys-
ditional CS-based methods in terms of both com- tems, especially when the joint optimization of
pression ratio and recovery speed. the traditional processing blocks is required and
Data-Driven Joint Source-Channel Coding: when the channels are hard to be characterized
Typical source coding mainly utilizes the sparse by analytical models. Here, we provide two exam-
property to remove the redundancy in source data ples for DL-based detection.
Received
Transmit Data Flow pilot signals
Channel
Gradients Flow
(b)
FIGURE 6. End-to-end communication system models: a) reinforcement learning based end-to-end com-
munication systems; b) conditional GAN based end-to-end communication systems.
channel often embraces different types of random consists of DNNs for the transmitter, the channel
effects, such as channel noise and time-varying, generator, and the receiver, which are trained iter-
which may be unknown or cannot be expressed atively. Since the conditional GAN learns to mimic
analytically. As shown in Fig. 6, we will introduce the channel effects, it acts as a surrogate channel
two methods to address the issue in this section. for the gradients to pass through, which enables
the training of the transmitter. The conditioning
Reinforcement Learning-based End-to-End Systems information for the conditional GAN is the transmit
In [15], a reinforcement learning based approach signals from the transmitter along with the received
has been proposed to circumvent the problem of pilot information used for estimating the channel.
missing gradients from channels when optimizing Therefore, the generated output distribution will
the transmitter. As shown in Fig. 6a, the transmit- be specific to the instantaneous channel and the
ter, converting the source data into the transmit transmit signals. As a result, the conditional GAN
symbols, is considered an agent while both the based end-to-end communication system can be
channel and the receiver are regarded as the envi- applied to more realistic time-varying channels. The
ronment. The agent will learn to take actions to simulation results in [4] confirm the effectiveness of
maximize the cumulative rewards emitted from the conditional GAN based end-to-end communi-
the environment. At each time, the transmit data cation system, by showing similar performance as
is regarded as the state observed by the trans- the Hamming (7,4) code with maximum-likelihood
mitter and the transmit signals are regarded as decoding (MLD).
the action taken by the transmitter. The end-to-
end loss on each sample will be calculated at the Conclusions and Future Directions
receiver and fed back to the transmitter as the We have demonstrated the great potential of DL
reward from the environment, which guides the in physical layer communications. By summarizing
training of the transmitter. By using the policy gra- how to apply DL in communication systems, the
dient algorithm, a standard reinforcement learning following research directions have been identified
approach, the transmitter can learn to maximize to bring intelligent physical layer communications
the reward, that is, optimize the end-to-end loss, from theory to practice.
without requiring the gradients from the channel. Can DL-based End-to-End Communications
Beat the Traditional? We have briefly introduced
Conditional GAN-based End-to-End Systems end-to-end communications. From the initial
In order to solve the missing gradient problem and research results in [4] and [15], the performance
lower the demands for the large amount of training of DL-based end-to-end communications is com-
data, a generative approach based on conditional parable with the traditional ones. However, it is
GAN has been proposed in [4]. As in Fig. 6b, the not clear whether DL-based end-to-end commu-
end-to-end learning of a communication system is nications will eventually outperform the traditional
enabled without requiring prior information of the ones in terms of performance and complexity or
channel by modelling the conditional distribution how much gain can be achieved. We are expect-
of the channel. In Fig. 6b, the end-to-end pipeline ing the answers to these questions soon.