0% found this document useful (0 votes)
9 views

Machine Learning-Based Channel Estimation in Massive MIMO With Channel Aging

Uploaded by

aishwarya.0225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Machine Learning-Based Channel Estimation in Massive MIMO With Channel Aging

Uploaded by

aishwarya.0225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

Machine Learning-Based Channel Estimation in


Massive MIMO with Channel Aging
Jide Yuan∗ , Hien Quoc Ngo∗ , and Michail Matthaiou∗
∗ Institute of Electronics, Communications and Information Technology (ECIT), Queen’s University Belfast, U.K.
e-mail: {y.jide, hien.ngo, m.matthaiou}@qub.ac.uk

Abstract—To support the ever increasing number of devices the base station (BS) has the same autocorrelation pattern for a
in massive multiple-input multiple-output systems, an excessive particular terminal. Leveraging this property, and by mapping
amount of overhead is required for conventional orthogonal pilot- multiple CSI series into a matrix, we are able to apply a similar
based channel estimation (CE) schemes. To relax this stringent
constraint, we design a machine learning (ML)-based time- technique from the field of image recognition, i.e, convolutional
division duplex scheme in which channel state information (CSI) NN (CNN) to detect the pattern of CSI variation, and with the
can be obtained by leveraging the temporal channel correlation. aging pattern as a prior knowledge, the prediction accuracy
The proposed ML-based predictors involve a pattern extraction can be substantially improved.
and CSI predictor, which can be implemented via either a In particular, we aim to reduce CE overhead via CSI
convolutional neural network (CNN) and autoregressive (AR)
predictor or an autoregressive network with exogenous inputs prediction by taking advantage of the autocorrelation across
recurrent neural network (NARX-RNN), respectively. Numerical CSI series. We first provide a ML-based time division duplex
results demonstrate that ML-based predictors can remarkably (TDD) scheme in which CSI is obtained via a ML-based
improve the prediction quality, and the optimal CE overhead is predictor instead of conventional pilot-based channel estimator.
provided for practical reference. Then, two ML-based structures are designed to improve the
CSI prediction, namely, CNN combined with AR predictor
I. I NTRODUCTION
(CNN-AR) and autoregressive network with exogenous inputs
With the exponential growth of devices, the conventional (NARX) RNN (CNN-RNN). The main idea is to use CNN to
channel estimation (CE) using orthogonal pilots is undoubt- identify the channel aging pattern, and adopt AR predictor or
edly incompetent considering the limited overhead resources. NARX-RNN to forecast CSI. Numerical results demonstrate
Meantime, in practice, channel state information (CSI) is that the CNN-AR outperforms other architectures, including
correlated over time [1], a phenomenon known as channel CNN-RNN, in terms of prediction accuracy, and provides the
aging. Leveraging this intrinsic phenomenon, CE overhead has optimal CE overhead with respect to accuracy requirements.
tremendous potential to be reduced by rigorous CSI prediction.
Channel aging is the variation of the channel caused by the II. S YSTEM M ODEL
user movement,the impact of which has been characterized in A TDD single-cell multi-user mMIMO system is considered,
prior literature [2, 3]. Paper [3] points out that the performance where a BS having N antennas serves K single-antenna users.
degradation caused by such phenomenon can be partly compen- We assume that the channel is static during each coherence
sated by applying channel prediction, which implies that this interval, but it does not change independently from one interval
practical impairment can be learned and used for estimating to the next. More precisely, there is a correlation over the
CSI. An effective method to model an aging channel is as channel coherence intervals. This is reasonable because the
autoregressive (AR) stochastic model whose parameters are scattering environment shares a high degree of similarity across
computed based on the channel correlation matching property several intervals [6].
among adjacent coherence intervals [4]. However, according The N × 1 channel vector between the BS and the kth user
to the Levinson-Durbin recursion which is used for computing at the lth coherence interval is modeled as
model parameters, the model order is bounded by the data p
gk [l] = hk [l] βk , (1)
amount of previous CSI samples.
Recently, machine learning (ML) based non-linear methods where βk represents large-scale fading, and hk [l] is the small-
have been successfully applied in wireless communications [5], scale fading. The overall channel from K users to the BS can
which motivates us to adopt relevant techniques to forecast be represented in matrix form as
CSI. Considering that the CSI forecasting is a typical time 1
G [l] = H [l] B 2 , (2)
series learning problem, which has been fully discussed in the
field of financial analysis, a recurrent neutral network (RNN) is where B is a diagonal matrix whose kth element is βk , and
a perfectly suitable NN for exploring the hidden pattern within H [l] = [h1 [l] , . . . , hK [l]] ∈ CN ×K .
CSI variations. Moreover, in massive multiple-input multiple-
output (mMIMO) scenarios, CSI series from each antenna at A. ML-Based TDD
This work was supported by the RAEng/The Leverhulme Trust Senior The frame structure in conventional TDD mainly consists
Research Fellowship LTSRF1718\14\2. of CE, uplink (UL) and downlink (DL) phases, in which the
978-1-5386-6528-2/19/$31.00 ©2019 IEEE
Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 14,2024 at 16:11:21 UTC from IEEE Xplore. Restrictions apply.
2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

Conventional TDD :
shifts with sampling duration Ts and the number of samples
in a coherence interval ν.
ML-based TDD :
In this paper, we assume the same autocorrelation among
all channels from a particular user to the BS antennas. Hence,
HB LB given the desired ACF as (4) for l > 0, we model the small-
P J
scale fading series as [4]
LB : XQ
hk [l] = − aq hk [l − q] + ω [l] , (5)
: UL DL : CE UL DL
q=1

where ω [l] is the uncorrelated complex white Gaussian noise


Fig. 1. Conventional TDD versus ML-based TDD. In learning-based block vector with zero mean and variance
(LB), CE overhead is removed from frame structure for P intervals due to XQ
the adoption of ML-based CSI prediction. σω2 = R [0] + aq R [−q], (6)
q=1

and {aq }Q
are the AR coefficients which are evaluated via the
q=1
channels estimated during the CE phase are further used for Levinson-Durbin recursion [4]. As Levinson-Durbin recursion
UL and DL transmission. Different from conventional TDD, is a well known algorithm, we skip the details for saving space.
the proposed ML-based
CE UL
TDDDLscheme increases the resources Remark 1: Given a desired ACF, the fitting accuracy of AR
for data transmission by reducing the CE overhead from the model improves with higher order Q. However, according to
frame structure, and CSI is obtained using a ML technique the Levinson-Durbin recursion, Q is upper bounded by the
Conventional CE:
via exploring the correlation among adjacent intervals. The amount of date of previous CSI samples, which implies that
ML-based TDDJ intervals
scheme contains
P intervals
two types of blocks, namely, the performance of channel prediction via the AR estimator is
head block
Learning-based CE:
(HB) and learning-based block (LB), shown in Fig. limited by the number of coherence intervals V in a HB for
1. The following assumptions are made in the ML-based TDD the proposed ML-based TDD scheme.
scheme: Intuitively, the small-scale fading vector in (5) for a particular
UL DL
• A HB consists of V conventional TDD coherence intervals; user follows the Gaussian distribution with zero mean and same
A LB consists of P ML-based coherence intervals (without variance. Denote by ḣk the small-scale fading in a typical
CE) and J (J < V ) conventional TDD coherence interval from the kth user to a typical antenna at the BS; its
intervals. variance can be calculated via the Green’s function [7]
• In a HB, channels of V intervals are estimated using X∞
the minimum mean square error (MMSE) estimator. σḣ2 = G2j σω2 , (7)
k j=1
After a HB, CSI is predicted via ML-based predictors
where
for P intervals, and is then updated for the following

 1,
Pilot  P
j = 0,
J intervals via the MMSE estimator UE to to improve the j
a G , j ≤ Q,
AP: Pilot UL Gj , q=1 q j−q
prediction accuracy for the subsequent LB. Computing delay
 PQ a G , j > Q,

AP to UP: DL q=1 q j−q
UAD & CE User clustering

UAD and CE UAD and CE User clustering


B. Channel Aging Model i.e., hk [l] ∼ CN (0, σḣ2 IN ), ∀l.
k

In general, aging property is nmainly


T slots slots
caused by the move-
ment of the users, and such feature can be approximately
characterized via the second order statistics of the channel, i.e., C. MMSE Estimation
autocorrelation function (ACF) [4].
We assume that the propagation path experiences a two- We assume that in a conventional coherence interval, the
dimensional isotropic scattering, whose corresponding normal- orthogonal pilots are used, and the channel is estimated using
ized continuous-time ACF at the BS is the MMSE estimator. For ease of analysis, we suppose that
the length of pilotisignal is equal to number of users, i.e.,
h T
R (t) = J0 (2πfd t) , (3) Ψ = ψ T1 , . . . , ψ TK ∈ CK×K , where ψ k for k = 1, . . . , K
where J0 (·) is the zeroth-order Bessel function of the first kind, with ΨΨH = IK . The operation (·)T and (·)H denote the
fd is the maximum Doppler frequency given by fd = vfc /c matrix transpose and conjugate transpose, respectively. The
with v is the velocity of user, c is the speed of light, and fc users use the same power pp to transmit pilots, and the received
is the carrier frequency. Although the formula indicates that training signal at the BS is
the channel impulse response varies continuously, we notice p
that the variation is nearly imperceptible over period of dozens Yp [l] = Kpp G [l] Ψ + N [l] , (8)
of channel samples. Therefore, we consider the discrete-time where N [l] is white additive Gaussian noise matrix whose
ACF of fading channel coefficients as elements have variance σn2 . Correlating Yp [l] with the pilot
R [l] = J0 (2πfn |l|) , (4) matrix Ψ, the BS obtains
1
where |l| is the delay in terms of the number of coherence Rp [l] = p Yp [l] ΨH , (9)
intervals, and fn = νTs fd represents the normalized Doppler Kpp

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 14,2024 at 16:11:21 UTC from IEEE Xplore. Restrictions apply.
2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

and the received noisy channel vector from the kth user at the N 
G N 
G ACF pattern

12N
1N
k
Input k

Operator
lth interval is

input
V W  0
NV V

Operator
1 Delay W  D

rp,k [l] = gk [l] + p N [l] ψ H


k . (10) 2N  V


Conv: 5  5 1



Kpp CNN CNN Delay W  3

Pool: 2  2
Delay W  2

Recalling the channel model in (1), the channel vectors


 from Conv: 3  3  1 Delay W 1
tanh

NR
AN
RN
X
Y
b
Pool: 2  2
the kth user to the BS is distributed as gk [l] ∼ CN 0, β̇k IN

A e
R i
Y

Inputs
Inputs

p
r
d
c
t
o
r
Full-Connect N 1 2N

according to (7) with β̇k = βk σḣ2 . Thus, the MMSE estimate N

Output
Refine-Unit

1N
k
ACF pattern
of gk [l] follows g cnn
k
g rnn
k Output

(a) (b)
ĝkmmse [l] ∼ CN (0, β̇k γkmmse IN ), (11)
β̇k
2
σn Fig. 2. (a). CNN-AR architecture in which CNN comprises an operator with
where γkmmse = β̇k +µ
with µ = pp K , and the variance of the two convolution layers and two max pooling layers and a full-connect layer
for extraction, and an AR predictor whose coefficients are pre-computed. (b).
estimator error emmse
k follows CN (0, (1 − γkmmse )β̇k IN ). CNN-RNN architecture in which CNN has same structure as CNN-AR, and
NARX-RNN comprises an operator with D delays and a refine unit for CSI
recovery.
III. ML- BASED C HANNEL F ORECASTING A PPROACHES
We aim to implement multi-step prediction for CSI to
minimize the CE overhead. In this section, two types of NN dimension of pattern label for the mth input data, and the
architectures, i.e., CNN-AR and CNN-RNN, are discussed for estimates of which are denoted by λ̃m lP .
CSI forecasting. The idea behind two ML-based architectures The procedure for CNN-AR scheme is described in Fig.
is identical; that is the time-series predictor collaborates with 2(a). Given G̈k as inputs, CNN transforms the complex
the CNN which is used to extract the ACF pattern. matrix into a real-valued matrix and identifies the CSI ACF
pattern. Then, system loads the pre-computed AR coefficients
A. CNN-AR Approach of the corresponding aging pattern, and predicts CSI for the
CNNs have been proved to have satisfactory performance subsequent interval as
in image classification problems [8]. Their key feature is XQ
ĝkcnn [l] = − aq ĝkmmse [l − q]. (14)
that they conduct different function units alternatively, e.g., q=1
convolution layers, pooling layers, full connection layers. According to the proposed ML-based TDD scheme, for the first
More importantly, CNNs treat the feature extraction and P intervals in LB, the NN output of the current interval is used
the classification identically; in particular, feature extraction as the input to forecast CSI for the next interval. Mathematically
is implemented by convolution layers and classification is speaking,
approached by full-connection layers. As the shared weights in XQ
convolution layers and the weights in full-connection layers are ĝkcnn [l + l0 ] = − aq ĝkmmse [l + l0 − q]
q=l0 +1
trained together, the total classification error of a well designed Xl0
CNN can be significantly minimized. − aq0 ĝkcnn [l + l0 − q 0 ], l0 ∈ P, (15)
q 0 =1
The mechanism of CNNs inspires us to adopt such archi-
tecture to extract the ACF pattern. As N CSI series for a until the next conventional coherence interval.
particular user vary according to the same ACF, by mapping Note that the given CNN structure is a simple NN which
multiple CSI series into a matrix, the input of CNN is can only distinguish dozens of ACF patterns with acceptable
  accuracy. As ACF is dominated by the Doppler shift, which
mmse
op G̈k = [op (ĝk [1]) , . . . , op (ĝkmmse [V ])] . (12) has hundreds of patterns, the engineering implementation of
such architecture should be much deeper. In this paper, we
This can be thought as a 2D image data, and the corre-
aim to emphasize the feasibility of our scheme, and simplify
sponding ACF is thought as label λ. The operator op (·)
the system structure for ease of training.
is an designed manipulation to map the complex-valued
CSI vector into a 2N -dimensional real-valued vector, i.e.,
T T T
op (gk [l]) = [Re{gk [l]} , Im{gk [l]} ] . By classifying the B. CNN-RNN Approach
pattern of ACF from op (G̈k ), we are able to regenerate the As CNN in the CNN-RNN structure is identical to that in
channel series using pre-trained CSI predictor without real time CNN-AR, we only introduce the CSI predictor, i.e., NARX-
calculation. RNN in this part.
We choose the adaptive moment estimation (ADAM) as the The general form of NARX-RNN is commonly described as
optimizer, and use the minimum square error (MSE) as the
loss function, which is defined as f [l] = f (x [l] , f [l − 1] , f [l − 2] , . . . , θ) , (16)
1 XM XLP   2
Ccnn = λm m
lP − λ̂lP , (13) where a one-step prediction of f [l] depends on the previous
2 m=1 lP =1 several outputs, input x [l], and some parameters θ. Such
where M represents the training data amount, LP represents architecture is implemented by introducing delays in the
the total number of ACF patterns, λm lP represents the lP th mechanism where the output has direct connections to the

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 14,2024 at 16:11:21 UTC from IEEE Xplore. Restrictions apply.
2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

past. In this paper, we adopt a widely used NARX RNN form,


specifically given in [9]
 XD 
f [l] = tanh W [0] x [l]+ W [d] f [l−d]+b , (17)
d=1

where D is the maximum number of delays, the weight matrix


W[d] ∈ R2N ×2N , W[0] ∈ R2N ×2N , and bias vector b ∈
R2N ×1 are the parameters trained in the NN.
As there is no input from the MMSE estimator at the first P
intervals in LB, to fit our problem, we make a minor revision
in (17). Taking the channel of kth user as example, the NARX
RNN is described as

op (ĝkrnn [l]) = tanh (W [0] op (ĝkmmse [l − 1])


XD 
+ W [d] op (ĝkmmse [l−d]) + b , (18) Fig. 3. Comparison of prediction NMSE among AR predictors, CNN-AR and
d=1
CNN-RNN with respect to normalized Doppler shift fn . Results are shown
where ĝkrnn [l] is the NARX-RNN prediction. Therefore, the for J = 4 and U = 10.
corresponding refine-unit for transforming the output from a
real value into a complex value is given by the subsequent interval, same as CNN-AR, the NN output of
ru (op (gk [l]))n = (op (gk [l]))n + i(op (gk [l]))n+N , current interval is used as input to predict the CSI, and we
repeat this procedure for P intervals.
where
√ (op (gk [l]))n is the nth element of op (gk [l]), and i = Note that NARX-RNN also suffers from the vanishing
−1. gradient and long-term dependencies problem [10]. However,
Consistent with typical RNNs, the training of this network such drawback will not cause a major issue to our problem
is based on minimizing the sum-of-squared error cost function since channel series only have strong relation within adjacent
1 rnn H rnn intervals.
Crnn = op (ĝk [l] − gk [l]) op (ĝk [l] − gk [l]) . (19)
2
The weight matrix W[0] is updated via its gradient IV. N UMERICAL R ESULTS
In our simulations, the BS deploys 128 antennas, and K
4W [0] = η∇W[0] Crnn , (20) users are randomly distributed in a 1 km2 area. We also set
where η is a learning rate and ∇W[0] is the Jacobian in the a guard zone of 100 meters for each user, i.e., the distance
derivative whose (i, j)th element is ∂w[0] ∂
with w [0]i,j being between any user and BS is no less than 100 m. The large-scale
i,j
the (i, j)-th element of matrix W [0]. By assuming that the fading βk is modeled as a function of user at distance dk , and
weights at different time instances are independent, the gradient is given as
can be expanded over l − d time steps via the chain rule βk (dk ) = 30.2 + 23.5 log10 (dk ) . (22)
XN
∇W[0] C =
H
(ĝkrnn [l] − gk [l]) ∇ĝkmmse [l] ĝk,n
rnn
[l] Regarding the CE overhead, we consider the ratio of pilot
XD n=1
 length to the number of samples in a coherence interval ν as
· ∇W[d] ĝkmmse [l] , (21) our metric, which is defined by
d=1
rnn Ocon = K/ν (23)
where ĝk,n
represents the estimated CSI from kth user to
nth antenna at BS. The methodology of training is called for a conventional TDD system, and
backpropagation through time (BPTT) algorithm, ans is detailed OML = Kφ/ν (24)
in [10].
The procedure for CNN-RNN is described in Fig. 2(b). At for ML-based TDD system, where
the beginning, NARX-RNN loads the pre-trained parameters JU + V
φ, , (25)
according to the received ACF pattern from CNN, and use P U + JU + V
g̃kmmse as input to predict the CSI for the next interval. In where U is the number of LBs.
The NMSE is chosen to evaluate the prediction performance,
TABLE I which is defined as
S YSTEM PARAMETERS .
( K  )
1 X 2 2
NMSE [l] = E kĝk [l] − gk [l]k2 kgk [l]k2 (26)
Number of ACF patterns LP 10 K
k=1
Number of intervals in HB V 8 for the lth step prediction. Some of the important parameters
Transmit power pp 0 dBm related to the simulation are shown in Table I.
Background noise power σn2 −174 dBm/Hz We first verify the accuracy of the CSI prediction for the
Carrier frequency 2 GHz proposed ML-based architecture, and choose the AR estimator

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 14,2024 at 16:11:21 UTC from IEEE Xplore. Restrictions apply.
2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

Fig. 4. Convergence of the prediction NMSE among AR predictors, CNN-AR Fig. 5. Optimal φ with respect to P under different NMSE requirements.
and CNN-RNN with respect to P . Results are shown for J = 4, U = 10, Results are shown as (P, NMSE, φ) with fn = 0.1 and Ocon = 0.3.
and fn = 0.1.

to achieve -18.5 dB of NMSE for P = 1 case, the ML-based


as the benchmark to illustrate the performance improvement. TDD scheme can save 77% amount of overhead; and given
Fig. 3 compares the NMSE of estimation for different predictors P = 5, the ML-based TDD scheme can save more than a
in terms of the normalized Doppler shift fn . It is intuitive half amount of overhead while achieving an NMSE less than
that the CNN-AR structure outperforms other predictors in all -10 dB. Also, the figure illustrates the limits of the proposed
situations. Compared with simple AR predictors, significant ML-based TDD scheme, where a strict prediction requirement
gains can be observed due to the fact that pre-computed AR is not achievable for multi-step prediction.
coefficients are much more precise than real-time computation.
Moreover, the performance of CNN-RNN is slightly superior to V. C ONCLUSION
that of AR predictor which indicates that RNNs indeed support In this paper, we designed a ML-based TDD scheme as
functionalities similar to those provided by AR predictors. well as the corresponding ML-based architecture to estimate
Compared with the performance of CNN-RNN in one-step the channels in massive MIMO systems under channel aging
prediction, the accuracy improvement in the second step effects. Combining the CNN with AR predictor or NARX-
prediction improves remarkably for small fn . More importantly, RNN, the proposed architecture achieves significant gains in
for large fn , all structures performs poorly. The reason is that prediction quality, and remarkable tradeoff between prediction
the independency of CSI over intervals increases with larger quality and CE overhead by leveraging the ACF pattern.
Doppler shifts, which implies that the proposed ML-based
TDD scheme is not suitable for super high mobility scenarios. R EFERENCES
Fig. 4 shows the average NMSE over 10 LBs against the [1] N. Palleit and T. Weber, “Time prediction of non flat fading channels,”
in Proc. IEEE ICASSP, May 2011, pp. 2752–2755.
number of intervals in LB P . Obviously, both ML-based [2] A. K. Papazafeiropoulos, “Impact of general channel aging conditions
structures outperform the AR predictors, while CNN-AR can on the downlink performance of massive MIMO,” IEEE Trans. Veh.
further yield at least 1.5 dB gain on every step prediction. The Technol., vol. 66, no. 2, pp. 1428–1442, Feb. 2017.
[3] C. Kong, C. Zhong, A. K. Papazafeiropoulos, M. Matthaiou, and Z.
reason is two-fold: One is that the channel series is modelled Zhang, “Sum-rate and power scaling of massive MIMO systems with
strictly according to its ACF, and with CNN extracting the channel aging,” IEEE Trans. Commun, vol. 63, no. 12, pp. 4879–4893,
aging pattern correctly, the coefficients loaded for AR predictor Dec. 2015.
are precisely accurate. Another one is that the designed NARX- [4] K. E. Baddour and N. C. Beaulieu, “Autoregressive modeling for fading
channel simulation,” IEEE Trans. Wireless Commun., vol. 4, no. 4, pp.
RNN may be not powerful enough to explore the hidden 1650–1662, July 2005.
feature within the CSI series; in this case, other time-series [5] T. Wang, C. Wen, H. Wang, F. Gao, T. Jiang, and S. Jin, “Deep
learning for wireless physical layer: Opportunities and challenges,” China
architectures, such as long short-term memory RNN, should Communications, vol. 14, no. 11, pp. 92–111, Nov. 2017.
be considered. It is worth noting that Lp used in simulations is [6] K. T. Truong and R. W. Heath, “Effects of channel aging in massive
small. In practice, the number of ACF pattern can be hundreds MIMO systems,” Journal of Communications and Networks, vol. 15,
no. 4, pp. 338–351, Aug. 2013.
which requires to extend ML-based architecture to a much [7] G. E. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Analysis:
deeper and larger structure for recognizing. To best of our Forecasting and Control. John Wiley & Sons, 2015.
knowledge, there is no general criterion to design the NN size, [8] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution
using very deep convolutional networks,” in Proc. IEEE CVPR, Jun.
and the choose of parameters that depend on Lp remains an 2016, pp. 1646–1654.
implementation-level. [9] R. DiPietro, N. Navab, and G. D. Hager, “Revisiting NARX recurrent
Finally, Fig. 5 illustrates the tradeoff between the CE neural networks for long-term dependencies,” CoRR, vol. abs/1702.07805,
overhead and prediction accuracy with CNN-AR predictor [10] 2017. [Online]. Available: http://arxiv.org/abs/1702.07805
T. Lin, B. G. Horne, P. Tino, and C. L. Giles, “Learning long-term
for ML-based TDD. First, the CE overhead can be sharply dependencies in NARX recurrent neural networks,” IEEE Trans. Neural
reduced by adopting the ML-based TDD scheme. For example, Netw., vol. 7, no. 6, pp. 1329–1338, Nov. 1996.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on March 14,2024 at 16:11:21 UTC from IEEE Xplore. Restrictions apply.

You might also like