Machine Learning-Based Channel Estimation in Massive MIMO With Channel Aging
Machine Learning-Based Channel Estimation in Massive MIMO With Channel Aging
Abstract—To support the ever increasing number of devices the base station (BS) has the same autocorrelation pattern for a
in massive multiple-input multiple-output systems, an excessive particular terminal. Leveraging this property, and by mapping
amount of overhead is required for conventional orthogonal pilot- multiple CSI series into a matrix, we are able to apply a similar
based channel estimation (CE) schemes. To relax this stringent
constraint, we design a machine learning (ML)-based time- technique from the field of image recognition, i.e, convolutional
division duplex scheme in which channel state information (CSI) NN (CNN) to detect the pattern of CSI variation, and with the
can be obtained by leveraging the temporal channel correlation. aging pattern as a prior knowledge, the prediction accuracy
The proposed ML-based predictors involve a pattern extraction can be substantially improved.
and CSI predictor, which can be implemented via either a In particular, we aim to reduce CE overhead via CSI
convolutional neural network (CNN) and autoregressive (AR)
predictor or an autoregressive network with exogenous inputs prediction by taking advantage of the autocorrelation across
recurrent neural network (NARX-RNN), respectively. Numerical CSI series. We first provide a ML-based time division duplex
results demonstrate that ML-based predictors can remarkably (TDD) scheme in which CSI is obtained via a ML-based
improve the prediction quality, and the optimal CE overhead is predictor instead of conventional pilot-based channel estimator.
provided for practical reference. Then, two ML-based structures are designed to improve the
CSI prediction, namely, CNN combined with AR predictor
I. I NTRODUCTION
(CNN-AR) and autoregressive network with exogenous inputs
With the exponential growth of devices, the conventional (NARX) RNN (CNN-RNN). The main idea is to use CNN to
channel estimation (CE) using orthogonal pilots is undoubt- identify the channel aging pattern, and adopt AR predictor or
edly incompetent considering the limited overhead resources. NARX-RNN to forecast CSI. Numerical results demonstrate
Meantime, in practice, channel state information (CSI) is that the CNN-AR outperforms other architectures, including
correlated over time [1], a phenomenon known as channel CNN-RNN, in terms of prediction accuracy, and provides the
aging. Leveraging this intrinsic phenomenon, CE overhead has optimal CE overhead with respect to accuracy requirements.
tremendous potential to be reduced by rigorous CSI prediction.
Channel aging is the variation of the channel caused by the II. S YSTEM M ODEL
user movement,the impact of which has been characterized in A TDD single-cell multi-user mMIMO system is considered,
prior literature [2, 3]. Paper [3] points out that the performance where a BS having N antennas serves K single-antenna users.
degradation caused by such phenomenon can be partly compen- We assume that the channel is static during each coherence
sated by applying channel prediction, which implies that this interval, but it does not change independently from one interval
practical impairment can be learned and used for estimating to the next. More precisely, there is a correlation over the
CSI. An effective method to model an aging channel is as channel coherence intervals. This is reasonable because the
autoregressive (AR) stochastic model whose parameters are scattering environment shares a high degree of similarity across
computed based on the channel correlation matching property several intervals [6].
among adjacent coherence intervals [4]. However, according The N × 1 channel vector between the BS and the kth user
to the Levinson-Durbin recursion which is used for computing at the lth coherence interval is modeled as
model parameters, the model order is bounded by the data p
gk [l] = hk [l] βk , (1)
amount of previous CSI samples.
Recently, machine learning (ML) based non-linear methods where βk represents large-scale fading, and hk [l] is the small-
have been successfully applied in wireless communications [5], scale fading. The overall channel from K users to the BS can
which motivates us to adopt relevant techniques to forecast be represented in matrix form as
CSI. Considering that the CSI forecasting is a typical time 1
G [l] = H [l] B 2 , (2)
series learning problem, which has been fully discussed in the
field of financial analysis, a recurrent neutral network (RNN) is where B is a diagonal matrix whose kth element is βk , and
a perfectly suitable NN for exploring the hidden pattern within H [l] = [h1 [l] , . . . , hK [l]] ∈ CN ×K .
CSI variations. Moreover, in massive multiple-input multiple-
output (mMIMO) scenarios, CSI series from each antenna at A. ML-Based TDD
This work was supported by the RAEng/The Leverhulme Trust Senior The frame structure in conventional TDD mainly consists
Research Fellowship LTSRF1718\14\2. of CE, uplink (UL) and downlink (DL) phases, in which the
978-1-5386-6528-2/19/$31.00 ©2019 IEEE
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 14,2024 at 16:11:21 UTC from IEEE Xplore. Restrictions apply.
2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)
Conventional TDD :
shifts with sampling duration Ts and the number of samples
in a coherence interval ν.
ML-based TDD :
In this paper, we assume the same autocorrelation among
all channels from a particular user to the BS antennas. Hence,
HB LB given the desired ACF as (4) for l > 0, we model the small-
P J
scale fading series as [4]
LB : XQ
hk [l] = − aq hk [l − q] + ω [l] , (5)
: UL DL : CE UL DL
q=1
and {aq }Q
are the AR coefficients which are evaluated via the
q=1
channels estimated during the CE phase are further used for Levinson-Durbin recursion [4]. As Levinson-Durbin recursion
UL and DL transmission. Different from conventional TDD, is a well known algorithm, we skip the details for saving space.
the proposed ML-based
CE UL
TDDDLscheme increases the resources Remark 1: Given a desired ACF, the fitting accuracy of AR
for data transmission by reducing the CE overhead from the model improves with higher order Q. However, according to
frame structure, and CSI is obtained using a ML technique the Levinson-Durbin recursion, Q is upper bounded by the
Conventional CE:
via exploring the correlation among adjacent intervals. The amount of date of previous CSI samples, which implies that
ML-based TDDJ intervals
scheme contains
P intervals
two types of blocks, namely, the performance of channel prediction via the AR estimator is
head block
Learning-based CE:
(HB) and learning-based block (LB), shown in Fig. limited by the number of coherence intervals V in a HB for
1. The following assumptions are made in the ML-based TDD the proposed ML-based TDD scheme.
scheme: Intuitively, the small-scale fading vector in (5) for a particular
UL DL
• A HB consists of V conventional TDD coherence intervals; user follows the Gaussian distribution with zero mean and same
A LB consists of P ML-based coherence intervals (without variance. Denote by ḣk the small-scale fading in a typical
CE) and J (J < V ) conventional TDD coherence interval from the kth user to a typical antenna at the BS; its
intervals. variance can be calculated via the Green’s function [7]
• In a HB, channels of V intervals are estimated using X∞
the minimum mean square error (MMSE) estimator. σḣ2 = G2j σω2 , (7)
k j=1
After a HB, CSI is predicted via ML-based predictors
where
for P intervals, and is then updated for the following
1,
Pilot P
j = 0,
J intervals via the MMSE estimator UE to to improve the j
a G , j ≤ Q,
AP: Pilot UL Gj , q=1 q j−q
prediction accuracy for the subsequent LB. Computing delay
PQ a G , j > Q,
AP to UP: DL q=1 q j−q
UAD & CE User clustering
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 14,2024 at 16:11:21 UTC from IEEE Xplore. Restrictions apply.
2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)
and the received noisy channel vector from the kth user at the N
G N
G ACF pattern
12N
1N
k
Input k
Operator
lth interval is
input
V W 0
NV V
Operator
1 Delay W D
…
Conv: 5 5 1
…
…
Kpp CNN CNN Delay W 3
Pool: 2 2
Delay W 2
NR
AN
RN
X
Y
b
Pool: 2 2
the kth user to the BS is distributed as gk [l] ∼ CN 0, β̇k IN
A e
R i
Y
Inputs
Inputs
p
r
d
c
t
o
r
Full-Connect N 1 2N
Output
Refine-Unit
1N
k
ACF pattern
of gk [l] follows g cnn
k
g rnn
k Output
(a) (b)
ĝkmmse [l] ∼ CN (0, β̇k γkmmse IN ), (11)
β̇k
2
σn Fig. 2. (a). CNN-AR architecture in which CNN comprises an operator with
where γkmmse = β̇k +µ
with µ = pp K , and the variance of the two convolution layers and two max pooling layers and a full-connect layer
for extraction, and an AR predictor whose coefficients are pre-computed. (b).
estimator error emmse
k follows CN (0, (1 − γkmmse )β̇k IN ). CNN-RNN architecture in which CNN has same structure as CNN-AR, and
NARX-RNN comprises an operator with D delays and a refine unit for CSI
recovery.
III. ML- BASED C HANNEL F ORECASTING A PPROACHES
We aim to implement multi-step prediction for CSI to
minimize the CE overhead. In this section, two types of NN dimension of pattern label for the mth input data, and the
architectures, i.e., CNN-AR and CNN-RNN, are discussed for estimates of which are denoted by λ̃m lP .
CSI forecasting. The idea behind two ML-based architectures The procedure for CNN-AR scheme is described in Fig.
is identical; that is the time-series predictor collaborates with 2(a). Given G̈k as inputs, CNN transforms the complex
the CNN which is used to extract the ACF pattern. matrix into a real-valued matrix and identifies the CSI ACF
pattern. Then, system loads the pre-computed AR coefficients
A. CNN-AR Approach of the corresponding aging pattern, and predicts CSI for the
CNNs have been proved to have satisfactory performance subsequent interval as
in image classification problems [8]. Their key feature is XQ
ĝkcnn [l] = − aq ĝkmmse [l − q]. (14)
that they conduct different function units alternatively, e.g., q=1
convolution layers, pooling layers, full connection layers. According to the proposed ML-based TDD scheme, for the first
More importantly, CNNs treat the feature extraction and P intervals in LB, the NN output of the current interval is used
the classification identically; in particular, feature extraction as the input to forecast CSI for the next interval. Mathematically
is implemented by convolution layers and classification is speaking,
approached by full-connection layers. As the shared weights in XQ
convolution layers and the weights in full-connection layers are ĝkcnn [l + l0 ] = − aq ĝkmmse [l + l0 − q]
q=l0 +1
trained together, the total classification error of a well designed Xl0
CNN can be significantly minimized. − aq0 ĝkcnn [l + l0 − q 0 ], l0 ∈ P, (15)
q 0 =1
The mechanism of CNNs inspires us to adopt such archi-
tecture to extract the ACF pattern. As N CSI series for a until the next conventional coherence interval.
particular user vary according to the same ACF, by mapping Note that the given CNN structure is a simple NN which
multiple CSI series into a matrix, the input of CNN is can only distinguish dozens of ACF patterns with acceptable
accuracy. As ACF is dominated by the Doppler shift, which
mmse
op G̈k = [op (ĝk [1]) , . . . , op (ĝkmmse [V ])] . (12) has hundreds of patterns, the engineering implementation of
such architecture should be much deeper. In this paper, we
This can be thought as a 2D image data, and the corre-
aim to emphasize the feasibility of our scheme, and simplify
sponding ACF is thought as label λ. The operator op (·)
the system structure for ease of training.
is an designed manipulation to map the complex-valued
CSI vector into a 2N -dimensional real-valued vector, i.e.,
T T T
op (gk [l]) = [Re{gk [l]} , Im{gk [l]} ] . By classifying the B. CNN-RNN Approach
pattern of ACF from op (G̈k ), we are able to regenerate the As CNN in the CNN-RNN structure is identical to that in
channel series using pre-trained CSI predictor without real time CNN-AR, we only introduce the CSI predictor, i.e., NARX-
calculation. RNN in this part.
We choose the adaptive moment estimation (ADAM) as the The general form of NARX-RNN is commonly described as
optimizer, and use the minimum square error (MSE) as the
loss function, which is defined as f [l] = f (x [l] , f [l − 1] , f [l − 2] , . . . , θ) , (16)
1 XM XLP 2
Ccnn = λm m
lP − λ̂lP , (13) where a one-step prediction of f [l] depends on the previous
2 m=1 lP =1 several outputs, input x [l], and some parameters θ. Such
where M represents the training data amount, LP represents architecture is implemented by introducing delays in the
the total number of ACF patterns, λm lP represents the lP th mechanism where the output has direct connections to the
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 14,2024 at 16:11:21 UTC from IEEE Xplore. Restrictions apply.
2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 14,2024 at 16:11:21 UTC from IEEE Xplore. Restrictions apply.
2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)
Fig. 4. Convergence of the prediction NMSE among AR predictors, CNN-AR Fig. 5. Optimal φ with respect to P under different NMSE requirements.
and CNN-RNN with respect to P . Results are shown for J = 4, U = 10, Results are shown as (P, NMSE, φ) with fn = 0.1 and Ocon = 0.3.
and fn = 0.1.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 14,2024 at 16:11:21 UTC from IEEE Xplore. Restrictions apply.