0% found this document useful (0 votes)
29 views

Wideband Channel Estimation With a Generative Adversarial Network

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Wideband Channel Estimation With a Generative Adversarial Network

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO.

5, MAY 2021 3049

Wideband Channel Estimation With a Generative


Adversarial Network
Eren Balevi and Jeffrey G. Andrews , Fellow, IEEE

Abstract— Communication at high carrier frequencies such narrowband channels [1]–[3], frequency-selective channels
as millimeter wave (mmWave) and terahertz (THz) requires [4]–[6] and low-resolution channels [7]. Deep generative mod-
channel estimation for very large bandwidths at low SNR. Hence, els offer an appealing approach to exploiting sparsity, as they
allocating an orthogonal pilot tone for each coherence bandwidth
leads to excessive number of pilots. We leverage generative can use knowledge of a finite number of signals from a class
adversarial networks (GANs) to accurately estimate frequency to learn a basis for the whole class. Furthermore, these models
selective channels with few pilots at low SNR. The proposed enable us to solve optimization problems with a simple and
estimator first learns to produce channel samples from the true fast gradient descent based method. As an additional benefit,
but unknown channel distribution via training the generative generative models can exploit the overall cross-correlations
network, and then uses this trained network as a prior to estimate
the current channel by optimizing the network’s input vector in among the frequency, time and spatial domains, which have
light of the current received signal. Our results show that at an been traditionally ignored to simplify the estimator [8]. Among
SNR of −5 dB, even if a transceiver with one-bit phase shifters is the two prominent deep generative models, namely variational
employed, our design achieves the same channel estimation error autoencoders (VAEs) [9] and generative adversarial networks
as an LS estimator with SNR = 20 dB or the LMMSE estimator (GANs) [10], we utilize a GAN in this paper, since GANs
at 2.5 dB, both with fully digital architectures. Additionally, the
GAN-based estimator reduces the required number of pilots by can very effectively compress the signals to a low dimensional
about 70% without significantly increasing the estimation error manifold by leveraging the channel structures. Exploiting this
and required SNR. We also show that the generative network property enables us to reduce the number of required pilots
does not appear to require retraining even if the number of for accurate channel estimation.
clusters and rays change considerably.
Index Terms— Frequency selective channel estimation, GAN, A. Motivation and Related Work
MIMO, terahertz and millimeter wave communication. High frequency bands incur high propagation losses for
terrestrial communication, and hence a large number of small
I. I NTRODUCTION antenna elements is needed to attain a large beamforming
gain. Conventional estimators require that the number of
M ILLIMETER wave (mmWave) and terahertz (THz)
communication offer large untapped bandwidths. Low
signal-to-noise-ratio (SNR) ( 0 dB) channel estimation at
pilots has to be at least equal to the number of transmit
antennas to avoid having an ill-posed problem. Thus, to reduce
these frequencies and bandwidths is desirable before beam the number of pilots the existing high bandwidth channel
alignment is completed, because exhaustively searching all estimators have been centered around compressed sensing
narrow beams without estimating the channel brings expo- tools motivated by the sparsity of mmWave channels [11].
nentially increasing sample and computational complexity. The same approach was also used for sub-6 GHz massive
However, there are nontrivial challenges regarding channel MIMO channels [12]–[15]. Unfortunately, it is very hard (or
estimation stemming from the propagation physics, the neces- impossible) to find the basis that would yield the sparsest
sity of using a great many antenna elements for high gain representation. Also, the reconstruction phase is complex and
beamforming, and the hybrid transceiver architectures. slow for compressed sensing algorithms, which require the
The contribution of this paper is to adapt powerful deep solution of an optimization problem to find the locations and
generative models for frequency selective mmWave and THz values of the sparse coefficients. This restricts their usage to
channel estimation as an alternative to known sparsity-based channels with fairly long coherence intervals.
compressed sensing algorithms, which have been used for Deep learning has been recently utilized as an alternative
for high-dimensional channel estimation. Specifically, [16]
Manuscript received August 13, 2020; revised December 4, 2020; accepted uses convolutional neural networks (CNNs) to make interpo-
December 21, 2020. Date of publication January 6, 2021; date of current
version May 10, 2021. The associate editor coordinating the review of this lation and denoising in 2-dimensional OFDM channels, [17]
article and approving it for publication was C. Huang. (Corresponding author: incorporates a special CNN as a denoiser to the approximate
Eren Balevi.) message passing (AMP) algorithm for beamspace channels,
The authors are with the Wireless Networking and Communications Group,
Department of Electrical and Computer Engineering, The University of and [18] adapts CNNs for 3-dimensional channels to exploit
Texas at Austin, Austin, TX 78712 USA (e-mail: [email protected]; the correlations in frequency, time and space. To prevent
[email protected]). the training complexity of CNNs for channel estimation,
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TWC.2020.3047100. [19], [20] proposed an untrained neural network that can
Digital Object Identifier 10.1109/TWC.2020.3047100 precede or follow a least-squares (LS) estimator for OFDM
1536-1276 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3050 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021

and MIMO-OFDM channels, respectively. Combining an LS Specifically, we benchmark our technique versus the
estimator with neural networks was also proposed in [21], performance of conventional channel estimators for fully
[22]. There are some other studies that consider deep learning digital transceivers. We find that our technique at an ultra-low
to tackle the detrimental effects of quantization for channel SNR of −5 dB matches the performance of LS estimation
estimation [23], [24]. at 20 dB and linear minimum mean square error (LMMSE)
What distinguishes this paper from the prior techniques is estimation at 2.5 dB. Furthermore, it is shown that GANs
that we design a GAN to learn to produce channel samples provide a lower channel estimation error than the traditional
according to its distribution, and then use this knowledge as a CNNs that are not trained with adversary loss, e.g., ResNet
priori information to estimate the actual current channel. This due to exploiting the high channel correlations much more
is a quite different approach from using GANs for channel efficiently. Additionally, our estimator allows a significant
modeling [25], [26]. Furthermore, as opposed to AMP-based reduction in pilot tones (more than 50%) without any
channel estimators, our GAN approach does not require us substantial performance loss, and yields lower estimation
to know or model the channel distribution. Instead, the GAN error than the Orthogonal Matching Pursuit (OMP) algorithm
learns to produce samples that statistically are very close to in this regime.
the true but unknown channel distribution. The closest papers
to our work are our recent papers [27], [28], which use a II. S YSTEM M ODEL AND P ROBLEM S TATEMENT
similar GAN-based channel estimation architecture to reduce We consider single user communication to estimate the
the number of pilots for single stream narrowband (frequency frequency selective channel over a large number of antennas
flat) massive MIMO channel estimation, with very tight (λ/10) via pilot symbols. However, all the ideas proposed in this
antenna spacing contributing extra spatial correlation. In con- paper are equivalently applicable to multi-user communication
trast, in the current paper we introduce a frequency selective if orthogonal pilots are allocated to each user and there is no
channel and utilize a more standard planar array with λ/2 inter-beam interference1. In the case of large antenna arrays,
spaced antennas. Additionally, we consider one-bit quantized having a dedicated RF chain per antenna is too costly in terms
phase shifters to further decrease the power consumption of hardware and power consumption. Thus, the number of
and hardware costs for such a large antenna array. We also RF chains is reduced by processing the signals both in the
study the generalization capability both through analysis and digital and analog domain. This architecture is illustrated in
experiments. Fig. 1. Here, Ns data streams are precoded digitally at each
subcarrier. Then, the precoded signal is OFDM modulated for
B. Contributions the NtRF RF chains, processed with an analog precoder (or
The main contribution of this paper is to propose and phase shifters) and transmitted over the Nt transmit antennas.
study a novel GAN-based channel estimation algorithm for Similarly, the receiver has an analog combiner that converts
wideband frequency selective channels. Although in this paper the Nr dimensional received signal into an NrRF × 1 vector.
the modulation and demodulation is based on OFDM, the The resultant signal is then OFDM demodulated and combined
proposed approach can be adapted to single-carrier frequency with the digital combiner at each subcarrier.
domain equalization (SC-FDE) systems as long as the channel
is estimated in frequency domain. We will demonstrate that A. Hybrid Transceivers
our GAN-based framework can estimate channels at very low The pilot tone p[n, k] with the first and second axes being
SNR with a reduced number of pilots for hybrid beamforming the time and frequency index is an Ns × 1 vector such that
architectures when the channel estimation is formulated as E[p[n, k]p[n, k]H ] = N1s INs . This signal is processed with
an inverse problem. In addition to the novel architecture, our the NtRF × Ns dimensional digital precoder matrix FBB [n, k]
contributions are both theoretical and empirical. as
Theoretically, there are two main contributions. First, the
s[n, k] = FBB [n, k]p[n, k] (1)
GAN framework requires sub-Gaussian measurements to meet
theoretical guarantees [29]. In the channel estimation case, where s[n, k] = [s1 [n, k], s2 [n, k], · · · , sNtRF [n, k]] for k =
T

these measurements are determined by the pilots and the 0, 1, · · · , Nf − 1, i.e., there are Nf subcarriers and sq [n, k]
digital and analog precoders/combiners. We prove that when corresponds to the pilot for the q th RF chain on the k th
the pilots are chosen as zero mean bounded i.i.d. random subcarrier. In accordance with (1), the time domain samples
variables, the sub-Gaussian requirement is indeed met for become
⎛ ⎞
channel estimation even if there are constraints due to phase s[n, 0]
shifters and total transmission power. Thus the corresponding ⎜ .. ⎟
u[n] = (FH ⊗ INRF ) ⎝ t . ⎠ (2)
guarantees hold. Second, we investigate the generalization
s[n, Nf − 1]
capability of the proposed estimator for channels with a 
different number of clusters and rays than the channel used s[n]
for training the GAN. Our technical approach is to apply
theoretical principles from reinforcement learning. 1 Note that the proposed estimator is robust to inter-beam interference as long

Our empirical results demonstrate that the major as the interference has a distribution whose tails are exponentially bounded
as has been recently proven in [30]. On the other hand, for heavy-tailed
challenges – hybrid transceivers, low SNRs and insufficient interference novel reconstruction methods are needed instead of minimizing
pilots – can be tackled with our proposed estimator. the Euclidean distance.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
BALEVI AND ANDREWS: WIDEBAND CHANNEL ESTIMATION WITH A GENERATIVE ADVERSARIAL NETWORK 3051

Fig. 1. The communication system model utilized for channel estimation, in which the pilots are passed through digital and analog precoders and combiners.

where FH is the Nf × Nf IDFT matrix and ⊗ denotes the such that


Kronecker product. Transmitting the pilots after the Nt ×NtRF Lc −1
analog precoder FRF 2 over a frequency-selective channel Hk = Cl e−j2πkl/Nf (9)
yields the received signal l=0

r[n] = ρC(INf ⊗ FRF )u[n] + v[n] (3) for k = 0, 1, · · · , Nf − 1. Hence,
where ρ is the average received power, √
⎛ ⎞ y[n] = ρ(INf ⊗ WRF H
)H(INf ⊗ FRF )s[n] + w[n]. (10)
C0 CNf −1 CNf −2 ··· C1
⎜ C1 Vectorizing the channel matrix H in (10) yields
⎜ C0 CNf −1 ··· C2 ⎟⎟
⎜ C2 √
C=⎜ C 1 C0 ··· C3 ⎟⎟ (4) y[n] = ρ (s[n]T (INf ⊗ FTRF ) ⊗ (INf ⊗ WRFH
)) h + w[n]
⎜ .. .. .. .. .. ⎟ 
⎝ . . . . . ⎠ A[n]
CNf −1 CNf −2 CNf −3 ··· C0 (11)
where Cl is an Nr × Nt channel matrix. We assume that there where h = vec(H). We assume that there are Np symbols in
are Lc channel taps in the time domain. Hence, the matrix Cl one frame and the channel is constant throughout this frame.
is non-zero only for l = 0, 1, · · · , Lc −1 and it is a zero matrix Thus,
for l = Lc , Lc + 1, · · · , Nf − 1. The noise vector v[n] is a ⎛ ⎞ ⎛ ⎞
y[n] h
zero mean i.i.d. Gaussian with covariance matrix σn2 INf Nr . ⎜ .. ⎟ √ ⎜ .. ⎟
Combining the received signal with the phase shifters, y=⎝ . ⎠ = ρ (INp ⊗ A[n]) ⎝ . ⎠ +w (12)

leading to the Nr × NrRF matrix WRF followed by the DFT y[n + Np − 1] A h
matrix F for OFDM demodulation gives 
h

y[n] = ρ(F ⊗ INRF r
)(INf ⊗ WRF H
)C(INf ⊗ FRF ) where w is obtained by concatenating the time domain sam-
(FH ⊗ INRF )s[n] + w[n] (5) ples as was done for y. Here, A refers to the measurement
t
matrix, and h (or h) is the target signal that we aim to estimate.
where w[n] = (F⊗INRF )(INf ⊗WRF
H
)v[n]. Due to the mixed-
r Optimizing the beamformer and combiner matrices in (12)
product property of the Kronecker product, i.e., (V ⊗ Y)
enhances the received SNR, but unfortunately there is no way
(X ⊗ Z) = (VX) ⊗ (YZ), it follows that
to optimally set these without knowing the channel. Hence, we
(F ⊗ INRF
r
)(INf ⊗ WRF
H
) = (INf ⊗ WRF
H
)(F ⊗ INr ). (6) consider an arbitrary scenario such that the digital beamformer
Substituting (6) into (5) and doing the same for the transmitter and combiner are set to the identity matrix, i.e., FBB [n, k] =
lead to INRF
t ×Ns
and WBB [n, k] = INRFr ×Ns
. Furthermore, the phase
√ shifters are adjusted with one-bit quantized angles to further
y[n] = ρ(INf ⊗ WRF H
)(F ⊗ INr )C(FH ⊗ INt ) reduce the power consumption of transceivers. This means that
(INf ⊗ FRF )s[n] + w[n]. (7) [FRF ]i,j = √1N ejθi,j , and [WRF ]i,j = √1N ejφi,j , in which
t r

Since C is a block-diagonal circulant matrix as given in θi,j , φi,j ∈ A, where A = {0, π}, and [FRF ]i,j and [WRF ]i,j
(4), it is diagonalized with its left and right multiplying are the (i, j)th element of FRF and WRF , respectively.
terms in (7) to
B. OFDM Channel Estimation With Multiple Antennas
H = (F ⊗ INr )C(FH ⊗ INt )
⎛ ⎞ The optimum channel estimator for (12) is found via the
H0 0 ··· 0
⎜ 0 ⎟ maximum a posteriori (MAP) optimization, which is equiva-
⎜ H 1 · · · 0 ⎟
=⎜ . .. ⎟ (8) lent to
⎝ .. .. ⎠
. .
ĥMAP = arg max log(P(y|h)) + log(P(h)). (13)
0 ··· 0 HNf −1 h

2 Since it is not practical to change the analog precoder and combiner at The main challenges for MAP estimation are the prohibitive
each symbol time, we omit the index n. computational complexity, since the coefficients of h are

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3052 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021

mixed in y and this makes the calculation of P(y|h) quite the CelebA dataset by cropping 64 × 16 portion of it from
complex, and the need for channel distribution. As a special the center and compare it with (i) a channel realization
case, when P(y|h) is Gaussian, ĥMAP becomes equivalent to from a generic geometric channel model to show the spatial
the LMMSE estimator correlations and (ii) a channel realization from TDL-E channel
model to show the frequency and time domain correlations.
ĥLMMSE = Rh (Rh + ρ−1 Γ)−1 ĥLS (14)
For ease of exposition, all signals are shaped to a 64 × 16
where matrix. Another difference is that a good performance metric
for channel estimation is the Euclidean distance or SNR, due
Rh = E[hhH ], to Gaussian noise and the dispassionate nature of symbol
Γ = (AH A)−1 AH E[wwH ]A(AH A)−1 , demodulation in the presence of such noise. On the other hand,
image quality is perceptual and qualitative, and Euclidean
and
distance and SNR are known to be poor measures of image
1
ĥLS = √ (AH A)−1 AH y. (15) quality [33], [34]. Indeed, a major feature of a GAN is that it
ρ can produce an image that is far from the target image under
Note that Γ becomes an identity matrix in the case of digital a quantitative measure like Euclidean distance, but very close
transceivers. However, (14) is still computationally expensive in a perceptual sense.
due to the matrix inversions. Also, AH A becomes non-
invertible if there are not sufficient pilots. The AMP algorithm III. C HANNEL E STIMATION W ITH G ENERATIVE
can near optimally solve (13) with reasonable complexity if A DVERSARIAL N ETWORKS
P(h) is known [31]. However, it is unrealistic to assume a In this part, we present the GAN-based estimator for
known P(h). Modeling h with Gaussian mixtures whose para- frequency-selective channels. Before going into the details of
meters are found with Expectation-Maximization algorithm how the GAN works for channel estimation, the basics of
can be a method if the elements of h are independent [32]. GANs and our architecture are briefly summarized. Then, we
However, the entries of h are correlated in wireless channels. explain how to solve the optimization problem in (16) at an
Finding a sparsifying basis for the channel in (12) and using algorithmic level.
OMP and Basis Pursuit Denoising (or Lasso) for channel
estimation lead to a high performance loss [28]. A. GAN Architecture
For multiple antenna OFDM channel estimation, we use a
A GAN is composed of a generator Gθ̂g : Rd → Rn , in
fundamentally different approach. Our key idea is to design
a GAN that learns to produce plausible channel samples which d  n and a discriminator Dθ̂d : Rn → {−1, +1},
instead of finding or modeling the highly complex channel where θ̂g and θ̂d are the parameters of the generator and
distribution. This is done offline, and then in the online phase discriminator neural networks. As explained in [10], the dis-
we inject these channel samples into the estimator. This yields criminator is first trained both with the true samples in the
the following optimization problem dataset that are labeled as valid and the fake samples produced
√ by the generator that are labeled as fake. In what follows, the
ĥGAN = arg min ||y − ρAh||22 + r(h) (16) generator is trained to enhance the quality of fake samples to
h
fool the discriminator so that the fake samples are classified
where as valid. Model selection for the generator and discriminator
0, if h is producible by the GAN networks is of key importance to facilitate the training. After
r(h) = (17) extensive model exploration, [35] developed a class of CNN
∞, o.w.
architecture called Deep Convolutional GAN (DCGAN), and
Note that (17) injects the a priori knowledge due to the empirically showed that using a DCGAN for the generator and
trained GAN into the estimator, which means that among the discriminator significantly alleviates the instability problems in
many possible candidates, the estimate is the one that can be training.
produced by the GAN3 . In our DCGAN model, the first layer of the generator
network processes the low dimensional vector with a fully
C. Channel Estimation and Image Reconstruction connected layer and ReLU activation function, and then
Differences for GANs reshapes it into a 3-dimensional vector. This layer is then
followed by four hidden layers, each of which is composed
GANs have already been used to solve inverse problems in
of upsampling, 2-dimensional convolution with 4 × 4 filters
image processing [29]. This raises the natural question of what
and 1 × 1 stride, batch normalization and ReLU activation
is novel about using a GAN for channel estimation. The first
function. It is worth noting that upsampling repeats the rows
answer is that the structure of the measurement matrix A in
and columns so as to have the same dimensions with the
(12) is very different for the two applications. Furthermore, the
channel matrix at the generator output. The output layer only
signal structures of natural images and channels are distinct.
involves a 2-dimensional convolution with a linear activation
To illustrate, in Fig. 2 we visualize a sample image from
function. The discriminator network of our DCGAN has a
3 We note that producible means the channel is either on the range space of 2-dimensional convolution with 3×3 filters and 2×2 stride for
the generator or close to it, with the meaning of “close” quantified in (22). the input that has leaky ReLU activation function and dropout

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
BALEVI AND ANDREWS: WIDEBAND CHANNEL ESTIMATION WITH A GENERATIVE ADVERSARIAL NETWORK 3053

Fig. 2. Visualisation of (a) a 64 × 16 cropped image in CelebA dataset, (b) a channel realization for 64 transmit and 16 receive antennas from geometric
channel model, and (c) a channel realization for 64 subcarriers and 16 OFDM symbols from TDL-E model.

with 0.25. This is followed by three hidden layers, each of divergence with Wasserstein distance in training, [36] proposes
which has 2-dimensional convolution with 3 × 3 filters and to replace (18) with
2 × 2 stride, batch normalization and leaky ReLU activation
min max Eh [Dθ̂d (h)] − Ez [Dθ̂d (Gθ̂g (z))]. (19)
function, and dropout with 0.25. The slope of the leak is 0.2 G D
for all leaky ReLU functions. The output has a fully connected The GANs whose generator and discriminator are trained
layer, and its dimension is 1 to classify the channel realizations according to (19) is known as a Wasserstein GAN. Note that for
as either valid or fake. a Wasserstein GAN, the Lipschitz condition must be satisfied,
Consecutively repeating the process of training Dθ̂d (·) k and we ensured this with weight clipping. We summarize
times and training Gθ̂g (·) one time to keep the discriminator the overall process of training a Wasserstein GAN according
near its optimal solution theoretically achieves the global to (19) in Algorithm 1, pointing the interested readers to [36]
optimum point of for more details.
min max Eh [log Dθ̂d (h)]+Ez [log(1−Dθ̂d (Gθ̂g (z)))], (18)
G D B. Frequency Selective Channel Estimator
where h represents the training channel samples and z is To solve the channel estimation problem in (16), we first
the generator input sampled from a distribution P(z). The train a Wasserstein GAN offline, then extract its generator net-
loss function in (18) – which corresponds to minimizing work Gθ̂g and iteratively optimize the input of this generative
Jensen–Shannon (JS) divergence between the generator dis- network online. This entire process is presented in Fig. 3 and
tribution and the empirical distribution of the training chan- detailed next.
nel samples – has been used since the seminal paper [10]. 1) Offline Phase: We envision that there will be datasets
However, it is practically delicate to train a GAN with (18), in future standards for different environments e.g., for urban
since the optimum value of k is not known, and the theoretical macro (UMa), urban micro (UMi), indoor-open office just like
guarantee is valid only for infinite capacity neural networks. the existing empirical channel models, and the neural network
To avoid these issues, based on the idea of replacing JS parameters are trained with these datasets. However, since

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3054 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021

Algorithm 1 Wasserstein GAN for Generating Channels Algorithm 2 GAN-Based Channel Estimation
1: while θ̂g and θ̂d have not converged do Input: Gaussian i.i.d. noise z
2: for t = 1, 2, · · · , k do Output: ĥGAN
Sample {hi }m Offline Phase:
3: i=1 , a batch from the real channel realiza-
tions 1: Train the Wasserstein GAN as explained in Algorithm 1
Sample {zi }m 2: Extract the trained generator Gθ̂
4: i=1 ∼ P(z), a batch from the input prior g

to generate channels Online Phase:



1 m
3: for each coherence time interval do
5: Calculate ∇θ̂d ← ∇θ̂d m i=1 Dθ̂d (hi ) − 4: Given the noisy received signal y solve (20)

5: Obtain the channel estimate as in (21)
Dθ̂d (Gθ̂g (zi )) 6: end for
6: Update θ̂d with any gradient ascent-based method, e.g.,
θ̂d ← θ̂d + α∇θ̂d optimized input corresponds to the channel estimate, i.e.,
7: θ̂d ← clip(θ̂d , − clip , clip)
ĥGAN = Gθ̂g (z∗ ). (21)
8: end for
9: Sample {zi }m i=1 ∼ P(z), a batch from the input prior to There are many methods to solve (20), e.g., standard gradi-
generate channels m ent descent or its variants. These steps are summarized in
1
10: Calculate ∇θ̂g ← −∇θ̂g m i=1 Dθ̂d (Gθ̂g (zi )) Algorithm 2.
11: Update θ̂g with any gradient descent-based method, e.g.,
θ̂g ← θ̂g − α∇θ̂g C. Measurement Matrix
12: end while It was proven in [29] that the framework in Fig. 3
reconstructs the signal with some bounded error when the
measurement matrix has a sub-Gaussian distribution4. More
specifically, when a ReLU generative network is utilized for
Gθ̂g , the channel estimation error is bounded by [29]

||ĥGAN −h||2 ≤ 6 min ||Gθ̂g (z∗ )−h||2 +3||w||2 +2 (22)


z ∗ ∈Rk

with probability 1 − e−Ω(Υ) where Υ = Nf NrRF Np . The


first term in the right-hand side (RHS) of (22) corresponds
to the Euclidean distance between the current channel sample
and the closest channel that can be produced by the trained
Fig. 3. Channel estimation by initializing the input z of the generative
generator, which is called representation error. The second
network, which is extracted from the pretrained GAN, with some random term is the channel noise. The last term comes from gradient
vector, e.g., with a realization of a Gaussian vector and optimizing z in light descent not necessarily converging to the global optimum and
of the observation y.
we denote it as optimization error. Hence, our GAN-based
framework estimates the channel at worst with the given bound
there is no such dataset yet, in this paper the parameters of in (22) when A is sub-Gaussian. Unfortunately, we cannot
the GAN are trained offline by generating samples from a easily say that the measurement matrix A has a sub-Gaussian
channel model as explained in Algorithm 1. After training, the distribution, because A is composed of the pilots, and digital
generative part is taken and used as in Fig. 3. We note that and analog precoders/combiners with some constraints. Specif-
the parameters of the generative network are never retrained ically, the analog precoder and combiner matrices must satisfy
until there are significant changes in the channel statistics. the constant modulus constraint, i.e., the magnitude square
The quantification of “significant changes” is a topic for future of each of its element must have the same constant value.
research, but we will provide an analysis and simulation results Furthermore, there is a total transmission power constraint.
to better understand this. We now prove they do for a general class of pilots.
2) Online Phase: After offline training, the input of the Theorem 1: If the pilot symbols are zero mean bounded
generator network is periodically optimized according to the i.i.d. random variables, then the measurement matrix A in (12)
received signal in (12) to minimize has sub-Gaussian entries for a given total transmission power.
Proof: See Appendix. 

z∗ = arg min ||y − ρAGθ̂g (z)||22 (20) Thus, our channel estimator algorithm for hybrid trans-
z ceivers provides a theoretical guarantee that the estimation
once per coherence time interval. The loss function in (20) error cannot be worse than the sum of the three error terms
is different than [28], which adds the l2 norm square of z to in (22). The first term in (22) does not diminish with high SNR
(20), multiplied with an hyperparameter, as a regularization 4 A random variable x has a sub-Gaussian distribution if its tail decays at
term. We note that z is initialized with Gaussian i.i.d. random least as fast as the tails of a Gaussian, or more formally P(|x| > t) ≤
2
variables and the output of the generative network for the Ce−vt for every t > 0, and positive constants C and v.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
BALEVI AND ANDREWS: WIDEBAND CHANNEL ESTIMATION WITH A GENERATIVE ADVERSARIAL NETWORK 3055

although the noise power or ||w||22 goes to 0 when SNR goes define the distribution Gθ̂g . We further assume that the GAN
to infinity. This leads to residual error even if is negligible, parameters would become θg leading to the distribution Gθg if
which becomes more prominent for high SNRs. On the other (on) (on)
it was trained with Ncl and Nray , the parameters of (24) for
hand, if there is a perfectly trained generative network, which the online channel. We next show that these two distributions
can produce all the samples in a channel distribution, then this belong to the same distribution family.
residual error becomes 0. Lemma 1: Gθ̂g and Gθg each have a sub-Gaussian
distribution.
IV. G ENERALIZATION C APABILITY
Proof: Assume that αi,l ’s in (24) are independent random
The training channel samples used for the GAN and the test variables, whose support set is between [−Bi,l , Bi,l ] and
channels to measure the performance of the proposed estimator the GAN is perfectly trained so that it learns the channel
must ideally have the same structure. However, the structures distribution. Then, using the Hoeffding’s inequality [37] given
in wireless channels can constantly change depending on in (40) for (24), the tail of Gθ̂g is specified as
various factors. This implies that the test channels can have ⎛ ⎞
different structures, and the ability to gracefully handle novel 2
t
data or the generalization capability of our estimator has to P(Gθ̂g ≥ t) ≤ exp ⎝−  (off)  (off) ⎠ (25)
Ncl Nray 2
be investigated. This, however, is an involved subject. In 2 i=1 l=1 ||a ||
i,l F
particular, determining the channel conditions that require 
retraining is a complicated problem, since this also depends where ai,l = NrγNt ar (φri,l , θi,l
r
)at (φti,l , θi,l
t H
) . Similarly, the
on the GAN architecture, i.e., some architectures can be tail of Gθg becomes
⎛ ⎞
more robust to the distributional changes5. Thus, we start by 2
t
studying the relatively simple case, namely the number of P(Gθg ≥ t) ≤ exp ⎝−  (on)  (on) ⎠ . (26)
clusters and rays, which are a linear function of the channel. Ncl Nray 2
2 i=1 l=1 ||a ||
i,l F
The proposed method in Fig. 3 searches the best estimate
in the channel manifold acquired with the trained generative From (25) and (26), Gθ̂g and Gθg are sub-Gaussian. 
network of the GAN, i.e., Lemma 1 is used to show that the gradient of the generator
√ input parameters with respect to the measurement error
zn+1 = zn − μn ∇zn ||y − ρAGθ̂g (zn )||22 , (23) √
Jn (θg , θ̂g ) = ||y − ρAGθ̂g (zn )||22 . (27)
where μn is the step size at the nth iteration. For the ease of
is unbiased even if the statistics of the test channel samples
analysis, we assume that the parameters θ̂g are trained offline
differ from the training samples in terms of the number of
with the samples that come from
clusters and rays. To make this point clear, let h be in the
 N
(off) (off)
Nray range space of the hypothetically online trained GAN for some
Nr Nt cl
Hk = αi,l ar (φri,l , θi,l
r
)at (φti,l , θi,l
t H
) input vector z. This means that the received signal in (12) can
γ i=1 be written as
l=1
(24) √
y = ρAGθg (z) + w. (28)
(off)
for k = 0, 1, · · · , Nf − 1 where Nclis the number of Thus, differentiating (27) with respect to the generative input
(off)
clusters, each of which has Nray paths. Here, αi,l is the yields
complex path gain of the lth ray in the ith cluster, and √
φri,l , θi,l
r
, φti,l , θi,l
t
are the azimuth and elevation angles of ∇zn Jn (θg , θ̂g ) = ∇zn yH y − ∇zn 2 ρyH AGθ̂g (zn )
arrival and departure respectively. For notation Lc −1simplicity, + ∇zn ρGθ̂g (zn )H AH AGθ̂g (zn ). (29)
we assume that the impact of pulse shape b=0 p(bTs −
− j2πkb Since both Gθ̂g and Gθg are sub-Gaussian as shown in
τi,l )e Nf
, where Ts is the symbol period and τi,l is the
corresponding delay, is included within αi,l . The vectors Lemma 1, the RHS of (25) and (26) converge to a Dirac delta
ar (φri,l , θi,l
r
) and at (φti,l , θi,l
t
) are the normalized receive and function δ(t) when γ → ∞, and hence for γ → ∞ we can
transmit antenna array response, which cover the relative angle observe that
 
of arrival and departure shift of each ray. Also, γ is the √
E ∇zn Jn (θg , θ̂g ) = ∇zn ||y − ρAGθg (zn )||22 . (30)
normalization constant to ensure that E[||Hk ||2F ] = Nr Nt .
To understand the generalization capability, we need to Although these are limiting results for very high SNR
perceive the impact of the channel statistics on (23). In this (γ → ∞) and i.i.d. channel taps, our empirical results in
regards, we assume that the online channel has a different Section V-C indicate a similar behavior for practical SNRs
number of clusters and rays than the offline samples, for and channel models.
which the GAN was trained. More precisely, we assume that The importance of (30) is related to our empirical observa-
(off) (off)
the GAN is offline trained with Ncl and Nray number tions in Sect. V.C that having mismatched numbers of clusters
of clusters and rays, and this yields the parameters θ̂g that and rays between online and offline phases does not appear
5 To avoid online retraining, receivers can also store different set of GAN
to degrade the performance. This can be explained with the
parameters after identifying the scenarios that require retraining, and pick the policy gradient concept in reinforcement learning, since we
most appropriate set of parameters. can model our sequential decision making in the z-space

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3056 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021

in (23) as a stateless (one-step) reinforcement task by taking an


action Gθ̂g (zk ) at each step and receiving the related penalty
in (27). Specifically, if we consider the trained generator G as
our policy, and the generator input z as the policy parameters,
then the problem in (23) becomes equivalent to updating the
policy parameters according to a scalar performance metric.
From policy gradient methods, it is known that using an
estimate of the true gradient performs as well as the ground
truth gradient if the expected value of the gradient estimate
approximates the true gradient [38]: which is ensured for our
case via (30). Furthermore, with more exploration we can
even enhance the channel estimates, because this enables us to
converge to a better local minima [38]. More precisely, since
the GAN generator supports vector arithmetic with some error Fig. 4. The impact of the number of epochs, training data size and batch size
e [35], defining z = zn + Δzn yields while training the Wasserstein GAN on the channel estimation error, which
are abbreviated as e, d, and b on the plot.

y = ρA(Gθg (zn ) + Gθg (Δzn ) + e) + w. (31)
An analogy for (31) is that if Gθg (zn ) is a man’s face without the ”5D toolbox”, which corresponds to (transmit antenna ×
glasses and Gθg (Δzn ) is glasses, then Gθg (zn + Δzn ) is receive antenna × subcarriers × OFDM symbols) from the
a clean image of a man with glasses, whereas Gθg (zn ) + TDL-E channel model that supports up to 0.1 THz [40].
Gθg (Δzn ) is a noisy image of a man with glasses. Since Although it is not obligatory to train the 4-dimensional array
the latter is also a meaningful image, there is an empirical with a single GAN, e.g., there can be Nt parallel GANs,
evidence that the error e is bounded; otherwise it would each of which can be trained with the same (16 × 64 × 16)
become a nonsense image [35]. Depending on this we further samples, alignment of the channels in frequency, time and
assume that zn and Δzn are independent. Therefore, if θg = spatial axes for the height, width and depth of the CNNs of the
θ̂g , then (29) becomes 0 in the first local minima, since generator and discriminator heavily affects the performance.
∇zk Gθ̂g (zk ) = 0 assuming that ∇zk e is negligible. This This is because a high correlation is needed for the height
means that the GAN input in (23) converges to the first local and width due to the upsampling in the generator. Since the
minima. On the other hand, if θg = θ̂g , (29) does not become correlations in spatial domain becomes relatively less than the
0 even if ∇zk Gθ̂g (zk ) = 0. This enables us to explore the other axes when the antenna spacing is λ/2 for moderate delay
landscape of the generator more. and Doppler spread, the frequency, time and space axes of the
V. N UMERICAL R ESULTS channels are aligned as height, width and depth, respectively.
Furthermore, the real and imaginary parts of the channels are
The proposed frequency selective estimator is assessed for split and stacked in the spatial axis.
downlink channel estimation, in which a base station that has The parameters are optimized with the training samples
uniform rectangular array (URA) for 64 antennas transmits via RMSprop optimizer, a variant of gradient descent, with
to a receiver with 16 URA antennas over 64 subcarriers. learning rate 0.00005 for 3000 epochs and a batch size of
The spacing among antenna elements is taken λ/2 both at 200. To understand the impacts of this setting, we compare
the transmitter and receiver unless otherwise stated. In the the channel estimation error for different GAN configurations
simulations, a single RF chain is utilized at the receiver as by generating 100 test channels. Unlike the traditional neural
an example. This heavily reduces the power consumption networks that are not trained with adversary loss, it is not clear
at the expense of decreasing the number of measurements, how to measure the training and validation error of a GAN,
i.e., the number of rows in A. To be more precise, the since the quality of training is perceptual. That’s why, we
power consumption per RF chain at mmWave frequencies is choose the channel estimation error as a performance metric.
about 300mW, which is expected to be even higher for THz Furthermore, throughout the simulations the generator input
communication [39]. Hence, the power consumption for the dimension is set to a 15 × 1 vector for z. As presented in
fully digital receiver becomes more than 4W, whereas it is Fig. 4, the number of epochs and training data length have a
only around 300mW for our receiver. Since the transmitter RF major effect on the normalized mean square error (NMSE),
chains do not affect the number of measurements, it is taken as whereas the batch size has relatively small impact.
Ns = NtRF , but surprisingly further reduction is possible for
the proposed estimator as will be discussed. Phase shifters both
at the transmitter and receiver have one-bit quantized angles B. Performance Results
as explained in Section II-A. Before giving the performance We first assume that both offline and online channel realiza-
results, we first explain the GAN training process. tions come from TDL-E model [40]. Furthermore, in this case
orthogonal pilots are sent at each transmit antenna by placing
A. GAN Training a pilot tone in each coherence bandwidth and time. With this
To train the GAN, we generated 5000 complex channel real- setting, the performance of the GAN-based channel estimation
izations with dimension (64×16×64×16) via MATLAB using for the hybrid architecture defined above is compared with

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
BALEVI AND ANDREWS: WIDEBAND CHANNEL ESTIMATION WITH A GENERATIVE ADVERSARIAL NETWORK 3057

Fig. 5. The comparison of the GAN-based channel estimation for hybrid architecture with the LS and LMMSE estimators for fully digital architecture, and
a supervised learning method.

(i) the standard LS and the near-optimum LMMSE6 channel is not as efficient as at low SNRs, this is not a problem,
estimators that operate on fully digital transceivers; and (ii) a since the probability of observing moderate or high SNRs is
supervised learning method, which maps y to the channel extremely low due to the high propagation losses and the lack
estimate ĥSL through a standard ResNet model [41] after of beamforming gain.
being trained with 5000 pairs of {y, h}. The only differences Another important point regarding Fig. 5 is that our
between our approach and [41] are the linear activation func- GAN-based estimator outperforms the supervised learning
tion at the output layer and the input and output dimensions: model even if there is plenty of labeled data and a powerful
our input is the stacked real and imaginary parts of y, and the CNN architecture. Specifically, to make a fair comparison, we
output is ĥSL . The performance metric is the NMSE trained the GAN and ResNet with 5000 clean channel samples
  via an RMSprop optimizer with a learning rate of 0.00005 and
||h − ĥ||22 a batch size of 200, and use the same number of pilots. Since
NMSE = E , (32)
||h||22 the ResNet architecture has many more parameters than the
in which the expected value is over the underlying probability DCGAN architecture (33 million vs. 2 million), we trained
distribution of the channel h, and ĥ refers to either ĥLMMSE , the GAN with 3000 epochs and measure the elapsed time
ĥLS , ĥGAN defined in (14), (15), (21) or ĥSL . for channel estimation. Then, we find the number of epochs
Considering the worse performance of LS and high com- that corresponds to this amount of time for the ResNet-based
plexity of LMMSE stemming from matrix inversion and chan- channel estimator, and plot the results accordingly. We note
nel covariance matrix estimation, the GAN estimator seems that even if the ResNet is trained with 3000 epochs we
intriguing for high frequency channel estimation as can be empirically observed that its performance is still considerably
seen in Fig. 5(a) and 5(b). Promisingly, our estimator can worse than the GAN for low delay spread. On the other hand,
tackle the negative impacts of the hybrid model and achieve for high delay spread 3000 epochs brings the ResNet-based
a close performance with respect to fully digital transceivers. estimator close to the GAN performance. This implies that
Specifically, the proposed design results in very low error at CNNs trained with conventional loss functions are not as
low SNRs, which is the only feasible regime for mmWave powerful as a GAN in exploiting the channel correlations as
and THz channel estimation. To illustrate, we can achieve can be observed in Fig. 5(a) and 5(b).
the fully digital transceiver performance of the LS estimator For mmWave and THz communication, sending orthogonal
at 20 dB SNR and LMMSE estimator at 2.5 dB SBR with pilots for each coherence bandwidth and time at each transmit
our estimator at −5 dB SNR when the delay spread is antenna leads to an excessive number of pilots due to the large
10 ns. Note that although LMMSE estimator is optimum number of antennas and bandwidth. In the next simulation,
for Gaussian noise when the channel covariance matrix is we address this problem by observing the estimation error
known, our performance is better than LMMSE for low SNRs, of the GAN-based framework when the number of pilots is
because of the estimation errors for channel covariance matrix. drastically reduced. Realizing that the lack of pilots creates an
Furthermore, the residual error term in our estimator due to ill-posed problem, and thus the LS and LMMSE estimators
the representation and optimization error, which are explained are undefined, one has little choice but to use compressed
in Section III-C and becomes more prominent at high SNRs, sensing algorithms. Thus, we use the generic OMP algorithm
results in almost flat NMSE after some SNR. When the delay as a benchmark after sparsifying the channel with a DFT
spread is increased to 100 ns, the performance of our estimator basis as opposed to the GAN algorithm that uses the original
slightly decreases, since this yields lower correlations and (unsparsified) channels. The OMP algorithm has a better
hence less structures. Although our estimator at high SNRs performance-complexity tradeoff with respect to LASSO and
does not require to know the channel distribution as opposed
6 Near-optimum LMMSE refers to the case when the noise is Gaussian, but to message-passing algorithms [28]. However, as compared to
the channel covariance matrix is estimated, i.e., not known. the OMP algorithm, there is a clear performance gain of the
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3058 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021

Fig. 6. The performance of the proposed estimator and OMP with respect to
η, where η is the ratio of the number of used pilots to the full pilot case, which Fig. 7. The generalization capability of the GAN for different number of
corresponds to using an orthogonal pilot tone for each coherence bandwidth clusters when it was trained with 20 clusters.
and time at each transmit antenna.

proposed estimator as can be seen in Fig. 6. More importantly,


irrespective of the SNR the pilots can be reduced even up to
70% with respect to the case of sending orthogonal pilots for
each coherence bandwidth and time at each transmit antenna
with only around 1 dB loss in NMSE. This brings us the
flexibility of decreasing the number of pilots in frequency,
time, or spatial domains or for a combination of these.
Interestingly, this can also be interpreted as decreasing the
number of transmit RF chains or sparsifying the connections
between RF chains and antenna elements when the impact
of s[n] is observed for A in (12). To illustrate, decreasing
the number of pilots into half can be interpreted as making
NtRF = Ns /2.
Fig. 8. The generalization capability of the GAN for different number of
rays when it was trained with 2 rays.
C. Generalization Capability
The statistics of the channel can vary with time. Thus, it is important outcome is that although Fig. 7 and Fig. 8 are
not always practical to assume that offline and online channels obtained for one subcarrier with a very small antenna spacing
come from the same model. On the other hand, assuming of λ/10 for experimental purposes to have more structures,
that the offline channel comes from a model, e.g., TDL-E their NMSE are much higher in comparison to Fig. 5 and
and the online channel from another, e.g., TDL-A makes Fig. 6 that enjoy frequency correlations despite an antenna
things uncontrollable and hard to understand the impact of spacing of λ/2.
underlying channel parameters. As a starting point, we focus
on the impact from the number of clusters and rays, and
verify our analysis in Section IV. We consider one subcarrier, D. Computational Complexity
e.g., Hk in (9), and model it as a geometric channel as given
in (24). The effect of the number of clusters is first observed Relying on the generalization capability of neural networks
by training the GAN with 20 clusters, each of which has and assuming that the channel statistics vary slowly, the
2 rays for Gaussian distributed angle of arrivals and departures training complexity of the generative parameters in the offline
with a variance of 5◦ . As shown in Fig. 7, keeping all the environment becomes negligible since it is amortized over a
parameters same except for the number of clusters for the long time period. Thus, we focus our analysis on the inference
online channel realizations yields that the channel estimation complexity that comes from (i) optimizing the generator input
error does not become worse when there are 10, 15 or 25 z∗ at each epoch using gradient descent as given in (23) and
clusters. In particular, the worst performance surfaces for 20 (ii) estimating the channel through the matrix-vector products
clusters and this is consistent with our analysis in Section IV. as in (21). For the ease of analysis, we first consider the
Next, the same simulation is repeated for the number of rays complexity between a single transmit and receive antenna,
in a cluster. In this case, we again use 20 clusters such that and then generalize this for all antennas. For the first step,
each cluster has 1, 2, 4 or 8 rays. The variance of the angle in the forward propagation the first layer of the generative
spread is still 5◦ for both arrival and departure beams. As network brings a complexity of O(Nf Np d), where d is the
can be seen in Fig. 8, although the GAN was trained with dimension of z, and dominates the complexity of the other
2 rays, having 1, 4 or 8 rays per cluster does even enhance the convolutional layers that have fewer parameters and smaller
channel estimation error similar to the previous case. Another matrix multiplications. Then, to compute the gradient in (23),

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
BALEVI AND ANDREWS: WIDEBAND CHANNEL ESTIMATION WITH A GENERATIVE ADVERSARIAL NETWORK 3059

which is equal to Considering the distribution of (35) due to an element of WRF


H

√ √ results in
∇zn ||y − ρAGθ̂g (zn )||22 = −2 ρ∇zn Gθ̂g (zn )T AT
√ Ã = wRF PT FBB [n]T FTRF . (36)
(y − ρAGθ̂g (zn )), (33)
(sub)
Notice that P(Ã) = P(Ã ), since P FBB [n] T T
FTRF is
the output of the generative network is multiplied with the
repeated for each element of WRF
H
.
measurement block diagonal matrix A that has Np blocks,
The (m, n)th element of à can then be written as
and this yields a complexity of O(Np Nf2 ), which is the same
√ NtRF  Ns 
with multiplying AT with (y − ρAGθ̂g (zn )). The backward
propagation complexity for ∇zn Gθ̂g (zn ) has nearly twice the ãm,n = wRF T
p f T
f˜T
m,i i,j j,n (37)
execution time of the forward propagation in CNNs [42], and j=1 i=1
the computational complexity of the second step only covers where pTm,i is the (m, i)th element of PT , fi,j
T
is the (i, j)th
the forward propagation of the first step. Thus, the overall ˜
element of FBB [n] and fj,n is the (j, n)th element of FTRF .
T T
complexity becomes O(Nt Nr Np Nf2 ). This is much less than
Due to the constant modulus constraint |wRF |2 = 1/Nr and
the LS, LMMSE and OMP estimators. Specifically, the LS
and LMMSE estimators require the inversion of AH A−1 as NtRF Ns
||FBB [n]||2F
T ˜T 2
can be seen in (14) and (15), and this yields a complexity of |fi,j fj,n | = . (38)
Nt
O(Nt3 Nr3 Np Nf3 ). The complexity of the OMP is related with j=1 i=1
the target signal dimension h and it is O(Nt3 Nr3 Nf3 ) [32]. Since there is a total transmission power constraint and Frobe-
nius norm is submultiplicative, ||FRF FBB [n]||2F = Ns ≤
VI. C ONCLUSION ||FRF ||2F ||FBB [n]||2F . Hence,
This paper demonstrates how to leverage GANs for effective NtRF Ns
frequency selective channel estimation in a mmWave or a THz T ˜T 2 Ns
|fi,j fj,n | ≥ . (39)
channel: that is, a low SNR channel with a large number j=1 i=1
Nt NtRF
of spatial elements. Promisingly, the proposed GAN-based
According to Hoeffding’s inequality, for zero mean bounded
channel estimation works well for a hybrid architecture with
i.i.d. random variables X1 , · · · XN and b = (b1 , · · · , bN ) ∈
one-bit quantized phase angles, and can even outperform
RN , we have
estimators designed for fully digital receivers. Additionally, N   
the proposed estimator enables a substantial reduction in the t2
P Xi bi ≥ t ≤ exp − (40)
number of pilot tones. Regarding its generalization capability,
i=1
2||b||22
we demonstrate that changes in the number of clusters and
rays in multipath channels can be inherently handled without for t ≥ 0. Replacing Xi with pTm,i and bi with
retraining the generative network. As future work, the impact NtRF T ˜T
wRF j=1 fi,j fj,n in (37) yields
of some other channel parameters such as power delay spread,  2 
Doppler spread and the angles of arrival and departure can be t Nt NtRF Nr
P (ãm,n ≥ t) ≤ exp − . (41)
analyzed. Furthermore, using a different generative model like 2Ns
a VAE instead of a GAN can be another extension. Thus, ãm,n has a sub-Gaussian distribution. Since (41) holds
for all m = 1, 2, · · · , Np and n = 1, 2, · · · , Nt , this completes
A PPENDIX the proof.
P ROOF OF T HEOREM 1
The measurement matrix due to a single subcarrier becomes R EFERENCES
[1] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estima-
A(sub) = INp ⊗ (p[n]T FBB [n]T FTRF ⊗ WRF
H
), (34) tion and hybrid precoding for millimeter wave cellular systems,” IEEE
J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 831–846, Oct. 2014.
in which the frequency index k is dropped for simplicity. As [2] A. Alkhateeb, G. Leus, and R. W. Heath, “Compressed sensing based
can be observed, the statistics of A in (12) are the same as multi-user millimeter wave systems: How many measurements are
needed?” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
A(sub) in (34). Thus, without any loss of generality we find the (ICASSP), Apr. 2015, pp. 2909–2913.
distribution of A(sub) . Stacking the block diagonal matrices of [3] R. Mendez-Rial, C. Rusu, N. Gonzalez-Prelcic, A. Alkhateeb, and
(34) yields a more compact expression R. W. Heath, “Hybrid MIMO architectures for millimeter wave commu-
nications: Phase shifters or switches?” IEEE Access, vol. 4, pp. 247–267,
Ã(sub) = PT FBB [n]T FTRF ⊗ WRF
H
(35) Jan. 2016.
[4] P. Schniter and A. Sayeed, “Channel estimation and precoder design
where for millimeter-wave communications: The sparse way,” in Proc. 48th
⎛ ⎞ Asilomar Conf. Signals, Syst. Comput., Nov. 2014, pp. 273–277.
p[n]T [5] Z. Gao, C. Hu, L. Dai, and Z. Wang, “Channel estimation for millimeter-
⎜ .. ⎟ wave massive MIMO with hybrid precoding over frequency-selective
PT = ⎝ . ⎠. fading channels,” IEEE Commun. Lett., vol. 20, no. 6, pp. 1259–1262,
Jun. 2016.
p[n + Np − 1]T [6] K. Venugopal, A. Alkhateeb, N. Gonzalez Prelcic, and R. W. Heath,
Since it is not practical to change FRF and WRF for each “Channel estimation for hybrid architecture-based wideband millime-
ter wave systems,” IEEE J. Sel. Areas Commun., vol. 35, no. 9,
channel realization, we assume that they are given and fixed. pp. 1996–2009, Sep. 2017.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3060 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021

[7] J. Mo, P. Schniter, and R. W. Heath, “Channel estimation in broadband [32] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixture
millimeter wave MIMO systems with few-bit ADCs,” IEEE Trans. approximate message passing,” IEEE Trans. Signal Process., vol. 61,
Signal Process., vol. 66, no. 5, pp. 1141–1154, Mar. 2018. no. 19, pp. 4658–4672, Oct. 2013.
[8] Y. Li, “Simplified channel estimation for OFDM systems with multiple [33] C. Doersch, “Tutorial on variational autoencoders,” 2016,
transmit antennas,” IEEE Trans. Wireless Commun., vol. 1, no. 1, arXiv:1606.05908. [Online]. Available: http://arxiv.org/abs/1606.05908
pp. 67–75, Jan. 2002. [34] L.-H. Chen, C. G. Bampis, Z. Li, A. Norkin, and A. C. Bovik, “Percep-
[9] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in tually optimizing deep image compression,” 2020, arXiv:2007.02711.
Proc. Int. Conf. Learn. Represent. (ICLR), May 2013, pp. 1–14. [Online]. Available: http://arxiv.org/abs/2007.02711
[10] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural [35] J. Li, J. Jia, and D. Xu, “Unsupervised representation learning of
Inf. Process. Syst., Dec. 2014, pp. 2672–2680. image-based plant disease with deep convolutional generative adver-
[11] T. S. Rappaport et al., “Millimeter wave mobile communications for 5G sarial networks,” in Proc. 37th Chin. Control Conf. (CCC), Jul. 2018,
cellular: It will work!,” IEEE Access, vol. 1, pp. 335–349, May 2013. pp. 9159–9163.
[12] Z. Gao, L. Dai, W. Dai, B. Shim, and Z. Wang, “Structured compressive [36] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adver-
sensing-based spatio-temporal joint channel estimation for FDD massive sarial networks,” in Proc. Int. Conf. Mach. Learn., vol. 70, Aug. 2017,
MIMO,” IEEE Trans. Commun., vol. 64, no. 2, pp. 601–617, Feb. 2016. pp. 214–223.
[37] R. Vershynin, High-Dimensional Probability: An Introduction with
[13] S. L. H. Nguyen and A. Ghrayeb, “Compressive sensing-based chan-
Applications in Data Science. Cambridge, U.K.: Cambridge Univ. Press,
nel estimation for massive multiuser MIMO systems,” in Proc. IEEE
2018.
Wireless Commun. Netw. Conf. (WCNC), Apr. 2013, pp. 2890–2895.
[38] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction,
[14] Z. Gao, L. Dai, Z. Wang, and S. Chen, “Spatially common sparsity based 2nd ed. Cambridge, MA, USA: MIT Press, 2018.
adaptive channel estimation and feedback for FDD massive MIMO,” [39] S. Rangan, T. Rappaport, E. Erkip, Z. Latinovic, M. R. Akdeniz, and
IEEE Trans. Signal Process., vol. 63, no. 23, pp. 6169–6183, Dec. 2015. Y. Liu, “Perceptually optimizing deep image compression,” in Proc.
[15] X. Lin, S. Wu, L. Kuang, Z. Ni, X. Meng, and C. Jiang, “Estimation IEEE Commun. Theory Workshop, Jun. 2013, pp. 1–25.
of sparse massive MIMO-OFDM channels with approximately common [40] 5G; Study on Channel Model for Frequencies From 0.5 to 100 GHz (Rel
support,” IEEE Commun. Lett., vol. 21, no. 5, pp. 1179–1182, May 2017. 14), document 3GPP TS 38.901, 3GPP FTP Server, 2017.
[16] M. Soltani, V. Pourahmadi, A. Mirzaei, and H. Sheikhzadeh, “Deep [41] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
learning-based channel estimation,” IEEE Commun. Lett., vol. 23, no. 4, image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
pp. 652–655, Apr. 2019. (CVPR), Jun. 2016, pp. 770–778.
[17] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel [42] K. He and J. Sun, “Convolutional neural networks at constrained time
estimation for beamspace mmWave massive MIMO systems,” IEEE cost,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Wireless Commun. Lett., vol. 7, no. 5, pp. 852–855, Oct. 2018. Jun. 2015, pp. 5353–5360.
[18] P. Dong, H. Zhang, G. Ye Li, I. S. Gaspar, and N. NaderiAlizadeh, “Deep
CNN-based channel estimation for mmWave massive MIMO systems,”
IEEE J. Sel. Topics Signal Process., vol. 13, no. 5, pp. 989–1000,
Sep. 2019.
[19] E. Balevi and J. G. Andrews, “Deep learning-based channel estimation
for high-dimensional signals,” 2019, arXiv:1904.09346. [Online]. Avail- Eren Balevi received the B.S., M.S., and Ph.D.
able: http://arxiv.org/abs/1904.09346 degrees in electrical and electronics engineering
[20] E. Balevi, A. Doshi, and J. G. Andrews, “Massive MIMO channel from Middle East Technical University, Ankara,
estimation with an untrained deep neural network,” IEEE Trans. Wireless Turkey, in 2008, 2010, and 2016, respectively. He is
Commun., vol. 19, no. 3, pp. 2079–2090, Mar. 2020. currently a Post-Doctoral Research Scholar with the
[21] Y. Yang, F. Gao, X. Ma, and S. Zhang, “Deep learning-based channel Department of Electrical and Computer Engineering,
estimation for doubly selective fading channels,” IEEE Access, vol. 7, The University of Texas at Austin. His current
pp. 36579–36589, Apr. 2019. research interests include the intersection between
[22] X. Ru, L. Wei, and Y. Xu, “Model-driven channel estimation for machine learning and communication theory. He is
OFDM systems based on image super-resolution network,” 2019, also interested with the general areas of 5G and
arXiv:1911.13106. [Online]. Available: http://arxiv.org/abs/1911.13106 beyond wireless systems, fog/edge networking, and
[23] S. Gao, P. Dong, Z. Pan, and G. Y. Li, “Deep learning based channel molecular communications.
estimation for massive MIMO with mixed-resolution ADCs,” IEEE
Commun. Lett., vol. 23, no. 11, pp. 1989–1993, Nov. 2019.
[24] E. Balevi and J. G. Andrews, “One-bit OFDM receivers via deep learn-
ing,” IEEE Trans. Commun., vol. 67, no. 6, pp. 4326–4336, Jun. 2019.
[25] T. J. O’Shea, T. Roy, and N. West, “Approximating the void: Learning Jeffrey G. Andrews (Fellow, IEEE) received the
stochastic channel models from observation with variational generative B.S. degree (Hons.) in engineering from the Harvey
adversarial networks,” in Proc. Int. Conf. Comput., Netw. Commun. Mudd College, and the M.S. and Ph.D. degrees in
(ICNC), Feb. 2019, pp. 681–686. electrical engineering from Stanford University.
[26] H. Ye, G. Y. Li, B.-H.-F. Juang, and K. Sivanesan, “Channel agnostic He is currently the Cockrell Family Endowed
end-to-end learning based communication systems with conditional Chair of engineering with The University of Texas at
GAN,” in Proc. IEEE Globecom Workshops (GC Wkshps), Dec. 2018, Austin. He developed CDMA systems at Qualcomm,
pp. 1–5. and has served as a consultant to Samsung, Nokia,
[27] A. Doshi, E. Balevi, and J. G. Andrews, “Compressed representation Qualcomm, Apple, Verizon, AT&T, Intel, Microsoft,
of high dimensional channels using deep generative networks,” in Sprint, and NASA. He is the coauthor of the books
Proc. IEEE 21st Int. Workshop Signal Process. Adv. Wireless Commun. Fundamentals of WiMAX (Prentice-Hall, 2007) and
(SPAWC), May 2020, pp. 1–5. Fundamentals of LTE (Prentice-Hall, 2010). He is an ISI Highly Cited
[28] E. Balevi, A. Doshi, A. Jalal, A. Dimakis, and J. G. Andrews, “High Researcher and has been a co-recipient of 15 paper awards, including the
dimensional channel estimation using deep generative networks,” IEEE 2016 IEEE Communications Society & Information Theory Society Joint
J. Sel. Areas Commun., vol. 39, no. 1, pp. 18–30, Jan. 2021. Paper Award, the 2014 IEEE Stephen O. Rice Prize, the 2014 and 2018 IEEE
[29] A. Bora, A. Jalal, E. Price, and A. G. Dimakis, “Compressed sensing Leonard G. Abraham Prize, the 2011 and 2016 IEEE Heinrich Hertz Prize,
using generative models,” in Proc. Int. Conf. Mach. Learn. (ICML), and the 2010 IEEE ComSoc Best Tutorial Paper Award. He received the 2015
Aug. 2017, pp. 1–24. Terman Award, the NSF CAREER Award, and the 2019 IEEE Kiyo Tomiyasu
[30] A. Jalal, L. Liu, A. G. Dimakis, and C. Caramanis, “Robust compressed technical field award. He is also the founding Chair of the Steering Committee
sensing of generative models,” 2020, arXiv:2006.09461. [Online]. Avail- for the IEEE J OURNAL ON S ELECTED A REAS IN I NFORMATION T HEORY,
able: http://arxiv.org/abs/2006.09461 amongst other IEEE leadership roles. He was the Editor-in-Chief of the
[31] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algo- IEEE T RANSACTIONS ON W IRELESS C OMMUNICATIONS from 2014 to 2016
rithms for compressed sensing,” Proc. Nat. Acad. Sci. USA, vol. 106, and the Chair of the IEEE Communications Society Emerging Technologies
no. 45, pp. 18914–18919, Nov. 2009. Committee from 2018 to 2019.

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.

You might also like