0% found this document useful (0 votes)
10 views

Supervised Deep Learning For MIMO Precoding

Research paper

Uploaded by

Siddharth. R
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Supervised Deep Learning For MIMO Precoding

Research paper

Uploaded by

Siddharth. R
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Supervised Deep Learning for MIMO Precoding

Aravind Ganesh Pathapati* , Nakka Chakradhar* , PNVSSK Havish* , Sai Ashish Somayajula* , Saidhiraj Amuru
Department of Electrical Engineering
Indian Institute of Technology - Hyderabad, India
Email: {ee16btech11026, ee16btech11022, ee16btech11023, ee16btech11043}@iith.ac.in, [email protected]

Abstract— In this paper, we aim to design an end-to-end deep scheme that attains best performance (say error rate), for a
learning architecture for a broadcast MIMO system with precod- given communication setting. The work proposed in [7] was
ing at the transmitter. The objective is to transmit interference- a breakthrough and a foundation for the application of deep
free data streams to multiple users over a wireless channel. We
propose end-to-end learning of communication systems modelled learning to wireless communication systems. It supports the
as a Deep autoencoder network with a novel cost function to fact that deep autoencoders can learn a perfect scheme at the
achieve this goal. This architecture enables optimization of the transmitter to transmit signals over the channel and decode
transmitter and receiver network weights jointly over a wireless it back at the receiver. With the success of autoencoders in
channel. We also introduce a way to precode the transmitter Single-input Single-output systems, we extend the application
embeddings before transmission. An end-to-end training of the
autoencoder framework of transmitter-receiver pairs is employed of the same to MIMO systems, with precoding in this paper.
while training the proposed transmit-precoded MIMO system
model. Several numerical evaluations over Rayleigh block-fading
This paper focuses on the of training an end-to-end model
(RBF) channels with slow fading are presented to prove this for the MIMO systems. Few previous works aimed to replace
approach. Specific training methods are suggested to improve a block of the entire end-to-end communication system by
performance over RBF channels in this paper. a deep network [6], while others were aimed at decoding
Index Terms—MIMO, Precoding, Autoencoders, Deep Learn- linear precoded messages with deep learning [8]. Some works
ing, Joint training were aimed at modulation classification through deep learning
[5]. However, we focused on deploying a deep network for
the entire system as an end-to-end model. The work by [12]
I. I NTRODUCTION
closely aligns with one of the solutions we proposed. However,
they considered a point-to-point link with multiple antennas at
In the last few years, Machine learning (ML) and Deep
the transmitter and receiver. In contrast, the focus of this paper
learning [1] has gained significant attraction on solving prob-
is a broadcast setting with multiple receivers.
lems in various domains where theory would not explain the
correlation between data. Communication has relied on the In this paper, we deploy a precoding layer at the transmitter
classical state of the art algorithms to encode and decode for MIMO systems, that produced results on par or sometimes
the messages over the decades. However, there has been a even better than the classical precoding algorithms, given
growing interest in the application of ML in this field too CSI. We generalised the algorithm to extend to any number
[2] [3]. With the advent of more powerful hardware, deep of transmitters and receivers (Nt ,Nr -MIMO). There is also
learning methods are resurfacing as a viable solution since no restriction on the message space i.e. the total number of
these computation-intensive algorithms can be deployed for messages in a communication setting. The framework is also
real-time applications. general in terms of different messages sent over various trans-
mitters and a flexible number of receivers and transmitters,
The goal in a communication system is to identify a
given channel state information (CSI). This framework can
transmission scheme/modulation scheme that minimizes the
be extended to more complicated MIMO systems to design
probability of error the most while decoding at the receiver’s
hybrid analog beamformers as well.
end, given power constraints. In the recent past, methods were
proposed where transmitters and receivers are modelled as To attain the primary goal of training an end-to-end broad-
deep autoencoder networks [4]. Autoencoders have gained cast MIMO model with precoding at the transmitter side,
popularity in various domains especially in dimensionality we focused on training a Deep autoencoder with a cross-
reduction and data compression due to its ability to learn entropy loss function and a suitable penalty to account for
an alternative representation of data (for instance lower di- CSI. As expected, an end-to-end training without CSI seems
mensional representation) with required characteristics that to perform poorly compared to classical algorithms which
conveys the original information [4]. In the context of wireless assume CSI availability. A workaround is to estimate the
communications, autoencoders were proposed with the goal of channel beforehand using a separate machine learning network
introducing non-linearity for learning the optimal modulation and pilot symbols at the receivers and feeding back this
* Equal
channel to the transmitter. Assuming the channel to be constant
contribution, names are arranged in alphabetical order.
for a finite time, we train the end-to-end model with this

Authorized licensed use limited to: Zhejiang University. Downloaded on January 26,2024 at 15:36:13 UTC from IEEE Xplore. Restrictions apply.
418
CSI. Evaluation of the proposed system on Rayleigh fading the number of channel uses. The receivers stay the same
channels shows that it performs on par or sometimes even as discussed above, with the only change that each receiver
better than most classical precoding algorithms such as zero considers only a slice of the complex matrix broadcasted by
forcing, MMSE-based and non-linear precoding such as THP. the transmitter i.e., the messages intended for itself. To be
PM
We shall discuss in detail the training algorithm and results in precise, ith receiver fθRR : CN
i
c

− {p ∈ R M
| i=1 p
(i)
= 1}
Sections III, IV and V. maps a slice of the received broadcast message matrix, yi
to a probability distribution. Decoding for the messages at
Notation : Matrices and column vectors shall be represented each receiver is done by selecting the symbol with maximum
by bold upper-case and lower-case letters, respectively. <(x) predicted probability.
and =(x) denote the real and imaginary parts of x.
A. Broadcast Transmitter Architecture
II. P REVIOUS W ORK
In broadcast systems, a single transmitter takes multiple
In [6], the channel estimation part of the communication messages, modulates and encodes them as necessary and
model is replaced by a DNN leaving the other parts intact. As broadcasts the message matrix through the channel. In neural
discussed previously, we propose an end-to-end deep learning networks, this can be envisioned as a multi-input autoencoder,
architecture, which can learn a constellation, perform precod- a modification of the network mentioned in [7]. The transmitter
ing and learn how to decode the received message. In the part of the autoencoder primarily consists of stacked dense
literature, DL approaches were proposed to model communica- layers with ELU activations to incorporate non-linear nature.
tion systems in single transmitter and single receiver scenario. This transmitter network accepts a column vector of messages.
Specifically, Hoydis et al., in [7], proposed autoencoders to Given Nr receivers, this network takes in [m1 , .., mNr ]T as
model the transmitter and receiver. The transmitter would input and maps them to Nc × Nr complex numbers. In neural
encode the messages, normalize the average energy to unit networks, it is a common practice to handle a complex number
energy, and transmit it through the unknown channel. Upon as a vector of two real numbers (real and imaginary parts). In
reception, the receiver would decode the said noisy message. the end, the transmitter can be visualised as the mapping fθTT
The transmitters and receivers are trained together as an
: MNr → − R2Nc Nr .
autoencoder with the channel as an intermediate layer within
the autoencoder. When the channel is an AWGN channel, the
transmitter and receiver are trained simultaneously, end-to-end B. Precoder Network
through stochastic backpropagation methods. The networks
Once we obtain the encoded complex numbers from the
were simple dense networks in this case. When the channel
transmitter, we explicitly pass the matrix through another deep
is an RBF channel, the networks are slightly tweaked. The
network, which takes up the task of precoding. In essence,
transmitters remained same, but the receivers included radio
the precoder network is another set of fully connected layers
transformer networks [14] in the first part to estimate the
with non-linear activations. They map the incoming matrix
channel from the received message.
to another state space of same dimension i.e., given the
transmitter output X, the precoder does the mapping GθP :
III. MIMO S YSTEM M ODEL X ∈ R2Nc Nr → − R2Nc Nr . Once precoded, we normalize the
average energy per symbol, i.e., each slice of the matrix, to be
In the proposed deep learning based end-to-end commu-
unit energy. The same is done for all other classical methods
nication systems, the transmitter, precoder and receiver are
with which we compare the deep learning model.
implemented as deep neural networks, with parameters (which
include both weights and bias) θT , θP and θR respectively.
In the aforementioned autoencoder based approach for SISO C. Channel Simulation
systems, the transmitter network fθTT : M → − C Nc maps
Once we get the embeddings for the messages from the
a message m from the set M = {1, ..., M } to a vector of
transmitter and then passed through the precoder, the column
complex symbols x ∈ CN , where Nc is the number of
vector is reshaped to a matrix X [Nr × Nc × 2]. Then
complex channel uses. The receiver fθRR : CNc → − {p ∈
PM the channel acts upon the matrix as the following operation
RM | k=1 p(k) = 1} would generate a probability distribution Y = HG(X) + N where H is the channel state information
from the received signal y, and decoding is achieved by of the Rayleigh Fading channel, a matrix of shape Nr × Nt
choosing the symbol with maximum probability. (here, we assume Nt = Nr , but this model can be easily
In the case of MIMO systems, with a broadcast preset, the extended to other cases), which is sampled from standard
transmitter network is modified to handle multiple messages complex Gaussian and N is additive white Gaussian noise.
at once. To be precise, transmitter fθTT : MNr →− CNr ×Nc Since H is complex, it will be of shape Nr × Nt × 2 in the
maps Nr ordered set of messages to Nr × Nc complex network during computation. The AWGN is sampled from the
numbers, where Nr is the number of receivers, and Nc is said distribution of mean and variance based on SNR values.

Authorized licensed use limited to: Zhejiang University. Downloaded on January 26,2024 at 15:36:13 UTC from IEEE Xplore. Restrictions apply.
419
Fig. 1: End-to-end model architecture.

Note that the channel operation is also modelled as a neural E. Loss Function
network layer in the end-to-end model. To be more precise,
consider a message vector m. Let X be the transmitter output For an end-to-end optimization of the networks as men-
once it its transformed from m. As discussed before, the tioned above, an optimal loss function must be leveraged.
channel is modelled as a suitable network which can handle Categorical cross-entropy is used as the loss function in this
complex matrix multiplication operations. Hence, the output paper as the problem is modeled as a classification task. In
from the channel network would be Y = HG(X) + N. Note theory, categorical cross-entropy loss is formulated on one hot
that Y would be the same shape as that of X. To incorporate representations of the ground truth vector y and the softmax
channel information during back propagation, we added a prediction vector ŷ as follows
penalty term in the loss function, which targets explicitly to X
CCE(y, ŷ) = − yi log(ŷi ) (1)
optimize the precoder parameters during stochastic updates.
i
This penalty term is of the form λkHG(X) − Xk2 where λ
is a hyper parameter which determines the weight that needs As mentioned before, we include an additional penalty term
to be given to the penalty. This portion of the loss function is for precoder network optimization. Hence, given the message
explained below. vector m, predicted messages by the receiver as m̂, transmitter
encoded output matrix X and precoder network G, the final
loss function would be
D. Receiver Architecture
Nr
1 X
Receiver architectures in broadcast setting are modelled L(m, m̂) = CCE(pi , p̂i ) + λkHG(X) − Xk2 (2)
Nr i=1
using multilayer perceptrons, fθRR : CN i
c

− {p ∈
M M (k) where CCE(pi , p̂i ) is the cross-entropy loss when the one-hot
P
R | k=1 p = 1}, which accepts a slice of the received
signal matrix Y and maps them to a probability distribution. representation of the ith message, mi i.e. pi , and the predicted
This is realised by stacking dense layers with linear activations probability distribution at the ith receiver i.e. p̂i are compared.
i.e. f (z) = z and the final layer of M nodes using a
ezi
softmax activation, i.e. pˆi = sof tmax(z)i = PK zj to output IV. E XPERIMENTAL SETUP
j e
probability of each message. Nr such receivers decode Nr
messages simultaneously. The training of multiple receivers is All the architectures, as mentioned earlier, can be stacked
discussed in detail in the next section. one after the other to form an end-to-end trainable deep
network. With this single deep network, several experiments
have been done to test their feasibility and performance. All
networks have been trained using TensorFlow.

Authorized licensed use limited to: Zhejiang University. Downloaded on January 26,2024 at 15:36:13 UTC from IEEE Xplore. Restrictions apply.
420
For simplicity, we’ve chosen Nr ∈ {2, 3}. The message from all the receivers.
set consists of 16 symbols, where we uniformly sample Nr
dL d 1 X
integers from the interval [0, 16] (i.e., mimics 16-QAM type = CCE(pi , p̂i )
modulation) and normalize them such that they lie in between dθT dθT Nr i=1
[0, 1]. These messages are then fed to the transmitter as a parallel optimization of receivers occurs in this end-to-end
column vector. training setup. This is handled automatically by TensorFlow by
The transmitter is a neural network with 2 fully connected compiling all the receiver models into an aggregate model with
layers with 10 × Nr × Nc and 2 × Nr × Nc nodes respectively. input Y and multiple outputs. At the transmitter side, Tensor-
For activations, we have used ELU and linear activation Flow automatically averages the gradients obtained from all
functions for the two layers mentioned above, which are then the receivers. The whole end-to-end model as depicted in Fig.
passed onto the precoder network for the precoding process. 1 is trained with standard back propagation technique. The
Let this transmitter output be represented by X Adam optimizer, with a learning rate of 5e-4 is used.

The transmitter processes a mini-batch of symbols sampled V. R ESULTS


from the message space, the resulting embeddings matrix X
is then passed through a precoder network. This precoder is In this section, we will go over the results for the autoen-
implemented as two stacked dense layers, each with 5 × Nc coder method, where there is a common transmitter network
and 2 × Nr × Nc nodes, respectively. The first layer has Tanh and multiple receiver architectures for each of the receivers.
activation, and the second one has linear activation. The matrix
X is flattened before sent into the precoder network. As mentioned in the earlier section, the transmitter network
is combined with both the available receiver networks and the
The outputs from the precoder network are then reshaped receiver networks train independently while transmitter trains
into a matrix of size [Nr × Nc × 2]. Let this be denoted by on the average gradient coming from both the receivers. The
G(X) . Here the matrix depth of two is maintained throughout simulation setup is as follows:
to represent the complex nature of embeddings. The matrix is
then normalized to have an average of unit energy per symbol • Message alphabet size is 16
transmitted. • Number of channel uses is 7
• MMIMO models: 2 × 2 and 3 × 3
The normalised precoded messages are then sent through a • All models were trained at an SNR of 20dB and tested
channel simulation network. The channel network performs the model at other SNR values.
the necessary operations mentioned before in the channel • The minibatch size of messages is 50, 000
simulation section and passes the matrix Y to the receivers for • The results were averaged over 10, 000 different realiza-
further decoding. All the operations taking place in channel tions of the channel matrix H and the messages
simulation are modified to reflect the complex nature of • Compared the generated results with existing precoding
channel. strategies like Zero-Forcing Precoding, MMSE precod-
Once the receiver receives the matrix Y , each receiver ing [9] and non-linear precoding mechanism such as
network chooses only one slice of the matrix, which is of Tomlinson-Harashima precoding (THP) [10]. Perfect CSI
relevance to it and feeds it to dense networks. Instead of is assumed for all these baseline schemes.
handling received complex numbers matrix as is, we flatten Due to the stochastic nature of neural nets, we generally do
it to a column vector for further processing. Each receiver not have a perfect approach to solve a given problem. So, as
network consists of 2 fully connected layers with 5 × Nc a starter, we first looked at how an end-to-end training system
nodes and M nodes, respectively. The first layer uses ReLU performs without explicitly specifying CSI. The network was
activations. The last layer has softmax activation to enable trained on the CCE loss function given by (1). The motivation
probability distributions for the messages that need to be for this is that the neural network might be able to arrive at a
predicted. As this is broadcast MIMO for multiple users, Nr good encoding-decoding strategy which also accounts for CSI
such receivers are placed in parallel to receive and decode Nr only by using an autoencoder with conventional loss functions
messages. Note that the parameters of one receiver model are like CCE. Figure 2 compares the bit error rates (BERs) for this
independent of another. The parameters of the ith receiver, θRi method with that of ZFP, MMSEP, and THP.
are optimized using the crossentropy loss between the one-hot
representation of the ith message and the softmax output at From Fig. 2, it is seen that the neural net approach performs
the ith receiver i.e. CCE(pi , p̂i ). worse than the existing schemes in terms of BER. Note that
the classical methods use the exact value of CSI for calculating
The transmitter’s parameters, θT are optimized using the the transmission strategies. This poor performance of the
average of the all the cross-entropy losses. In other words, deep learning method is expected because, without CSI the
during back propagation in the end-to-end training of the transmitter would not be able to eliminate the interference
autoencoder, the transmitter trains using the average gradients between the two receivers. As an initial step of improvement,

Authorized licensed use limited to: Zhejiang University. Downloaded on January 26,2024 at 15:36:13 UTC from IEEE Xplore. Restrictions apply.
421
Fig. 2: BER vs SNR for cases a) end-to-end training with cost
function (1) b) feeding H as input to the transmitter network Fig. 3: 2x2 MIMO BER vs SNR with CSI input and cost
and cost function (1). function (2).

we gave the channel matrix as an input to the network with


an expectation that this would provide sufficient CSI for the
network. But this was not the case because the results in this
approach were same as that in the figure 2. Since providing
CSI information as an input to the neural net yielded no
improvements, we modified the loss function of the network
as given by the (2) which penalizes the weights of precoder
network.
Figures 3 and 4 are the resulting average bit error rates when
the network was trained using the loss function in (2). It is seen
that the result in 3 is better than that of 2. In the experimental
setting of 3, we used a penalty term in the loss function
which incorporates CSI into training. The loss function in (2)
helps in reducing the BER rate and also helps the precoder
network to counter the channel effects. The reason for a better
performance of 3 over 2 is attributed to this penalty term Fig. 4: 3x3 MIMO SNR vs BER with CSI input and cost
which helps to overcome the inter-receiver interference in the function (2).
broadcast setting. Fig. 3 is for 2×2 MIMO setting and 4 is for
3×3 MIMO setting. The two figures follow a the similar trend
in terms of performance. We can observe that for lower SNRs, rate.
the autoencoder performs similar to the existing schemes. But
In a 2×2 MIMO setting, we will have 256 unique combina-
for higher SNRs(above 20dB) we can see that the autoencoder
tions for an alphabet of size 16. According to the transmitter
performs better than the existing schemes. The performance of
architecture, the dimension of the flattened output vector of
the deep learning method seems to approach that of centralized
the transmitter is 2Nr × Nc , which is 28. We used TSNE [13]
setting wherein the transmitter jointly encodes the messages
to project this high dimensional vector into 2D space to get an
for both the data streams and the receivers jointly perform
intuition of what constellation the network is trying to learn.
maximum likelihood detection.
Figure 5 shows the constellation diagram of the transmitter
Note that in all the results presented, the channel matrix encoding of messages projected in 2D space for a 2x2 MIMO
remains the same while the network is trained and tested. setting. It is a scatter plot of the encodings of all possible
This suggests that whenever H changes, the model has to be 256 message pairs projected into the 2D space. We can see
retrained. This is an open problem which will be discussed a scattered pattern of the constellation points distributed over
in the last section. To compare the results of autoencoder the 2D plane. The reason for the good performance of the
approach with that of existing methods, we train-tested the autoencoder approach for a fixed channel can be attributed to
autoencoder model for 10000 different realisations of the this type of spacing which eventually minimizes the error rate.
channel matrix and averaged it to get the average bit error

Authorized licensed use limited to: Zhejiang University. Downloaded on January 26,2024 at 15:36:13 UTC from IEEE Xplore. Restrictions apply.
422
R EFERENCES
[1] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
no. 7553, pp. 436–444, May 2015.
[2] C. Zhang, P. Patras, and H. Haddadi, “Deep Learning in Mobile and
Wireless Networking: A Survey,” in IEEE Commun. Surv. Tut., vol. 21,
no. 3, pp. 2224-2287, Mar. 2019.
[3] A. Zappone, M. Di Renzo, and M. Debbah, “Wireless networks design in
the era of deep learning: Model-based AI-based or both?”, arXiv preprint,
arXiv:1902.02647 Feb. 2019.
[4] D. E. Rumelhart, et. al., “Learning internal representations by error
propagation.” in Parallel Distributed Processing. vol. 1: Foundations, MIT
Press, Cambridge, MA, 1986.
[5] T. J. O’Shea, et. al.,, “Convolutional radio modulation recognition net-
works”, in Proc. Int. Conf. Eng. Appl. Neural Netw. pp. 213-226 2016.
[6] N. Shlezinger et. al.,, “ViterbiNet: A Deep Learning Based Viterbi
Algorithm for Symbol Detection,” in IEEE Trans. Wireless Commun.,
Fig. 5: Constellation diagram projected into 2D space using vol. 19, no. 5, pp. 3319-3331, May 2020
TSNE [7] T. J. O’Shea and J. Hoydis, “An introduction to deep learning for the
physical layer,” IEEE Trans. Cognitive Commun. Netw., vol. 3, no. 4,
pp. 563–575, Dec. 2017.
[8] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode linear
VI. C ONCLUSION codes using deep learning,” in Proc. Allerton Conf. Commun., Control,
Computing (Allerton), Monticello, IL, pp. 341-346, Oct. 2016.
[9] Esslaoui, Mounir and Essaaidi, Mohamed, “Performance of Multiuser
This paper focuses on a MIMO system in a broadcast setting MIMO-OFDM downlink system with ZF-BF and MMSE-BF linear
modeled using deep learning methods, specifically as an as precoding,” in Intl. J. Innovation and Applied Studies, vol. 3, no. 4,
an autoencoder system. We implemented a deep autoencoder pp. 946-952, Aug. 2013
[10] X. Lu, and R. C. De Lamare, “Study of Tomlinson-Harashima Precoding
network which has perfect CSI. We can see that the DL Strategies for Physical-Layer Security in Wireless Networks” arXiv
approach performs as good as classical precoding strategies, preprint, arXiv:1610.07034 Oct. 2016.
such as zero forcing, MMSE and THP, for lower SNRs and [11] F. A. Aoudia and J. Hoydis, “End-to-End Learning of Communications
Systems Without a Channel Model,” in 52nd Asilomar Conference on
even better at higher SNR. However, an open problem is Signals, Systems, and Computers, Pacific Grove, CA, USA, 2018, pp. 298-
providing CSI. In this work, we trained the system every time 303, 2018
the channel changes. One direction for future work which we [12] T. J O’Shea, et. al., “Deep Learning Based MIMO Communications”.
in arXiv preprint, arXiv:1707.07980 2017.
are currently pursuing is to improve the Tx and Rx architecture [13] van der Maaten, Laurens and Hinton, Geoffrey, “Visualizing Data using
to accommodate channel changes in a more automated fashion t-SNE,” in Journal of Machine Learning Research, vol. 9, pp. 2579-2605,
instead of manual retraining. 2008
[14] T. J O’Shea et. al., “Radio Transformer Networks: Attention Models
for Learning to Synchronize in Wireless Systems,” 2016 50th Asilomar
Conference on Signals, Systems and Computers, Pacific Grove, CA, pp.
662-666 2016

Authorized licensed use limited to: Zhejiang University. Downloaded on January 26,2024 at 15:36:13 UTC from IEEE Xplore. Restrictions apply.
423

You might also like