Wideband Channel Estimation With a Generative Adversarial Network
Wideband Channel Estimation With a Generative Adversarial Network
Abstract— Communication at high carrier frequencies such narrowband channels [1]–[3], frequency-selective channels
as millimeter wave (mmWave) and terahertz (THz) requires [4]–[6] and low-resolution channels [7]. Deep generative mod-
channel estimation for very large bandwidths at low SNR. Hence, els offer an appealing approach to exploiting sparsity, as they
allocating an orthogonal pilot tone for each coherence bandwidth
leads to excessive number of pilots. We leverage generative can use knowledge of a finite number of signals from a class
adversarial networks (GANs) to accurately estimate frequency to learn a basis for the whole class. Furthermore, these models
selective channels with few pilots at low SNR. The proposed enable us to solve optimization problems with a simple and
estimator first learns to produce channel samples from the true fast gradient descent based method. As an additional benefit,
but unknown channel distribution via training the generative generative models can exploit the overall cross-correlations
network, and then uses this trained network as a prior to estimate
the current channel by optimizing the network’s input vector in among the frequency, time and spatial domains, which have
light of the current received signal. Our results show that at an been traditionally ignored to simplify the estimator [8]. Among
SNR of −5 dB, even if a transceiver with one-bit phase shifters is the two prominent deep generative models, namely variational
employed, our design achieves the same channel estimation error autoencoders (VAEs) [9] and generative adversarial networks
as an LS estimator with SNR = 20 dB or the LMMSE estimator (GANs) [10], we utilize a GAN in this paper, since GANs
at 2.5 dB, both with fully digital architectures. Additionally, the
GAN-based estimator reduces the required number of pilots by can very effectively compress the signals to a low dimensional
about 70% without significantly increasing the estimation error manifold by leveraging the channel structures. Exploiting this
and required SNR. We also show that the generative network property enables us to reduce the number of required pilots
does not appear to require retraining even if the number of for accurate channel estimation.
clusters and rays change considerably.
Index Terms— Frequency selective channel estimation, GAN, A. Motivation and Related Work
MIMO, terahertz and millimeter wave communication. High frequency bands incur high propagation losses for
terrestrial communication, and hence a large number of small
I. I NTRODUCTION antenna elements is needed to attain a large beamforming
gain. Conventional estimators require that the number of
M ILLIMETER wave (mmWave) and terahertz (THz)
communication offer large untapped bandwidths. Low
signal-to-noise-ratio (SNR) ( 0 dB) channel estimation at
pilots has to be at least equal to the number of transmit
antennas to avoid having an ill-posed problem. Thus, to reduce
these frequencies and bandwidths is desirable before beam the number of pilots the existing high bandwidth channel
alignment is completed, because exhaustively searching all estimators have been centered around compressed sensing
narrow beams without estimating the channel brings expo- tools motivated by the sparsity of mmWave channels [11].
nentially increasing sample and computational complexity. The same approach was also used for sub-6 GHz massive
However, there are nontrivial challenges regarding channel MIMO channels [12]–[15]. Unfortunately, it is very hard (or
estimation stemming from the propagation physics, the neces- impossible) to find the basis that would yield the sparsest
sity of using a great many antenna elements for high gain representation. Also, the reconstruction phase is complex and
beamforming, and the hybrid transceiver architectures. slow for compressed sensing algorithms, which require the
The contribution of this paper is to adapt powerful deep solution of an optimization problem to find the locations and
generative models for frequency selective mmWave and THz values of the sparse coefficients. This restricts their usage to
channel estimation as an alternative to known sparsity-based channels with fairly long coherence intervals.
compressed sensing algorithms, which have been used for Deep learning has been recently utilized as an alternative
for high-dimensional channel estimation. Specifically, [16]
Manuscript received August 13, 2020; revised December 4, 2020; accepted uses convolutional neural networks (CNNs) to make interpo-
December 21, 2020. Date of publication January 6, 2021; date of current
version May 10, 2021. The associate editor coordinating the review of this lation and denoising in 2-dimensional OFDM channels, [17]
article and approving it for publication was C. Huang. (Corresponding author: incorporates a special CNN as a denoiser to the approximate
Eren Balevi.) message passing (AMP) algorithm for beamspace channels,
The authors are with the Wireless Networking and Communications Group,
Department of Electrical and Computer Engineering, The University of and [18] adapts CNNs for 3-dimensional channels to exploit
Texas at Austin, Austin, TX 78712 USA (e-mail: [email protected]; the correlations in frequency, time and space. To prevent
[email protected]). the training complexity of CNNs for channel estimation,
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TWC.2020.3047100. [19], [20] proposed an untrained neural network that can
Digital Object Identifier 10.1109/TWC.2020.3047100 precede or follow a least-squares (LS) estimator for OFDM
1536-1276 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3050 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021
and MIMO-OFDM channels, respectively. Combining an LS Specifically, we benchmark our technique versus the
estimator with neural networks was also proposed in [21], performance of conventional channel estimators for fully
[22]. There are some other studies that consider deep learning digital transceivers. We find that our technique at an ultra-low
to tackle the detrimental effects of quantization for channel SNR of −5 dB matches the performance of LS estimation
estimation [23], [24]. at 20 dB and linear minimum mean square error (LMMSE)
What distinguishes this paper from the prior techniques is estimation at 2.5 dB. Furthermore, it is shown that GANs
that we design a GAN to learn to produce channel samples provide a lower channel estimation error than the traditional
according to its distribution, and then use this knowledge as a CNNs that are not trained with adversary loss, e.g., ResNet
priori information to estimate the actual current channel. This due to exploiting the high channel correlations much more
is a quite different approach from using GANs for channel efficiently. Additionally, our estimator allows a significant
modeling [25], [26]. Furthermore, as opposed to AMP-based reduction in pilot tones (more than 50%) without any
channel estimators, our GAN approach does not require us substantial performance loss, and yields lower estimation
to know or model the channel distribution. Instead, the GAN error than the Orthogonal Matching Pursuit (OMP) algorithm
learns to produce samples that statistically are very close to in this regime.
the true but unknown channel distribution. The closest papers
to our work are our recent papers [27], [28], which use a II. S YSTEM M ODEL AND P ROBLEM S TATEMENT
similar GAN-based channel estimation architecture to reduce We consider single user communication to estimate the
the number of pilots for single stream narrowband (frequency frequency selective channel over a large number of antennas
flat) massive MIMO channel estimation, with very tight (λ/10) via pilot symbols. However, all the ideas proposed in this
antenna spacing contributing extra spatial correlation. In con- paper are equivalently applicable to multi-user communication
trast, in the current paper we introduce a frequency selective if orthogonal pilots are allocated to each user and there is no
channel and utilize a more standard planar array with λ/2 inter-beam interference1. In the case of large antenna arrays,
spaced antennas. Additionally, we consider one-bit quantized having a dedicated RF chain per antenna is too costly in terms
phase shifters to further decrease the power consumption of hardware and power consumption. Thus, the number of
and hardware costs for such a large antenna array. We also RF chains is reduced by processing the signals both in the
study the generalization capability both through analysis and digital and analog domain. This architecture is illustrated in
experiments. Fig. 1. Here, Ns data streams are precoded digitally at each
subcarrier. Then, the precoded signal is OFDM modulated for
B. Contributions the NtRF RF chains, processed with an analog precoder (or
The main contribution of this paper is to propose and phase shifters) and transmitted over the Nt transmit antennas.
study a novel GAN-based channel estimation algorithm for Similarly, the receiver has an analog combiner that converts
wideband frequency selective channels. Although in this paper the Nr dimensional received signal into an NrRF × 1 vector.
the modulation and demodulation is based on OFDM, the The resultant signal is then OFDM demodulated and combined
proposed approach can be adapted to single-carrier frequency with the digital combiner at each subcarrier.
domain equalization (SC-FDE) systems as long as the channel
is estimated in frequency domain. We will demonstrate that A. Hybrid Transceivers
our GAN-based framework can estimate channels at very low The pilot tone p[n, k] with the first and second axes being
SNR with a reduced number of pilots for hybrid beamforming the time and frequency index is an Ns × 1 vector such that
architectures when the channel estimation is formulated as E[p[n, k]p[n, k]H ] = N1s INs . This signal is processed with
an inverse problem. In addition to the novel architecture, our the NtRF × Ns dimensional digital precoder matrix FBB [n, k]
contributions are both theoretical and empirical. as
Theoretically, there are two main contributions. First, the
s[n, k] = FBB [n, k]p[n, k] (1)
GAN framework requires sub-Gaussian measurements to meet
theoretical guarantees [29]. In the channel estimation case, where s[n, k] = [s1 [n, k], s2 [n, k], · · · , sNtRF [n, k]] for k =
T
these measurements are determined by the pilots and the 0, 1, · · · , Nf − 1, i.e., there are Nf subcarriers and sq [n, k]
digital and analog precoders/combiners. We prove that when corresponds to the pilot for the q th RF chain on the k th
the pilots are chosen as zero mean bounded i.i.d. random subcarrier. In accordance with (1), the time domain samples
variables, the sub-Gaussian requirement is indeed met for become
⎛ ⎞
channel estimation even if there are constraints due to phase s[n, 0]
shifters and total transmission power. Thus the corresponding ⎜ .. ⎟
u[n] = (FH ⊗ INRF ) ⎝ t . ⎠ (2)
guarantees hold. Second, we investigate the generalization
s[n, Nf − 1]
capability of the proposed estimator for channels with a
different number of clusters and rays than the channel used s[n]
for training the GAN. Our technical approach is to apply
theoretical principles from reinforcement learning. 1 Note that the proposed estimator is robust to inter-beam interference as long
Our empirical results demonstrate that the major as the interference has a distribution whose tails are exponentially bounded
as has been recently proven in [30]. On the other hand, for heavy-tailed
challenges – hybrid transceivers, low SNRs and insufficient interference novel reconstruction methods are needed instead of minimizing
pilots – can be tackled with our proposed estimator. the Euclidean distance.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
BALEVI AND ANDREWS: WIDEBAND CHANNEL ESTIMATION WITH A GENERATIVE ADVERSARIAL NETWORK 3051
Fig. 1. The communication system model utilized for channel estimation, in which the pilots are passed through digital and analog precoders and combiners.
Since C is a block-diagonal circulant matrix as given in θi,j , φi,j ∈ A, where A = {0, π}, and [FRF ]i,j and [WRF ]i,j
(4), it is diagonalized with its left and right multiplying are the (i, j)th element of FRF and WRF , respectively.
terms in (7) to
B. OFDM Channel Estimation With Multiple Antennas
H = (F ⊗ INr )C(FH ⊗ INt )
⎛ ⎞ The optimum channel estimator for (12) is found via the
H0 0 ··· 0
⎜ 0 ⎟ maximum a posteriori (MAP) optimization, which is equiva-
⎜ H 1 · · · 0 ⎟
=⎜ . .. ⎟ (8) lent to
⎝ .. .. ⎠
. .
ĥMAP = arg max log(P(y|h)) + log(P(h)). (13)
0 ··· 0 HNf −1 h
2 Since it is not practical to change the analog precoder and combiner at The main challenges for MAP estimation are the prohibitive
each symbol time, we omit the index n. computational complexity, since the coefficients of h are
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3052 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021
mixed in y and this makes the calculation of P(y|h) quite the CelebA dataset by cropping 64 × 16 portion of it from
complex, and the need for channel distribution. As a special the center and compare it with (i) a channel realization
case, when P(y|h) is Gaussian, ĥMAP becomes equivalent to from a generic geometric channel model to show the spatial
the LMMSE estimator correlations and (ii) a channel realization from TDL-E channel
model to show the frequency and time domain correlations.
ĥLMMSE = Rh (Rh + ρ−1 Γ)−1 ĥLS (14)
For ease of exposition, all signals are shaped to a 64 × 16
where matrix. Another difference is that a good performance metric
for channel estimation is the Euclidean distance or SNR, due
Rh = E[hhH ], to Gaussian noise and the dispassionate nature of symbol
Γ = (AH A)−1 AH E[wwH ]A(AH A)−1 , demodulation in the presence of such noise. On the other hand,
image quality is perceptual and qualitative, and Euclidean
and
distance and SNR are known to be poor measures of image
1
ĥLS = √ (AH A)−1 AH y. (15) quality [33], [34]. Indeed, a major feature of a GAN is that it
ρ can produce an image that is far from the target image under
Note that Γ becomes an identity matrix in the case of digital a quantitative measure like Euclidean distance, but very close
transceivers. However, (14) is still computationally expensive in a perceptual sense.
due to the matrix inversions. Also, AH A becomes non-
invertible if there are not sufficient pilots. The AMP algorithm III. C HANNEL E STIMATION W ITH G ENERATIVE
can near optimally solve (13) with reasonable complexity if A DVERSARIAL N ETWORKS
P(h) is known [31]. However, it is unrealistic to assume a In this part, we present the GAN-based estimator for
known P(h). Modeling h with Gaussian mixtures whose para- frequency-selective channels. Before going into the details of
meters are found with Expectation-Maximization algorithm how the GAN works for channel estimation, the basics of
can be a method if the elements of h are independent [32]. GANs and our architecture are briefly summarized. Then, we
However, the entries of h are correlated in wireless channels. explain how to solve the optimization problem in (16) at an
Finding a sparsifying basis for the channel in (12) and using algorithmic level.
OMP and Basis Pursuit Denoising (or Lasso) for channel
estimation lead to a high performance loss [28]. A. GAN Architecture
For multiple antenna OFDM channel estimation, we use a
A GAN is composed of a generator Gθ̂g : Rd → Rn , in
fundamentally different approach. Our key idea is to design
a GAN that learns to produce plausible channel samples which d n and a discriminator Dθ̂d : Rn → {−1, +1},
instead of finding or modeling the highly complex channel where θ̂g and θ̂d are the parameters of the generator and
distribution. This is done offline, and then in the online phase discriminator neural networks. As explained in [10], the dis-
we inject these channel samples into the estimator. This yields criminator is first trained both with the true samples in the
the following optimization problem dataset that are labeled as valid and the fake samples produced
√ by the generator that are labeled as fake. In what follows, the
ĥGAN = arg min ||y − ρAh||22 + r(h) (16) generator is trained to enhance the quality of fake samples to
h
fool the discriminator so that the fake samples are classified
where as valid. Model selection for the generator and discriminator
0, if h is producible by the GAN networks is of key importance to facilitate the training. After
r(h) = (17) extensive model exploration, [35] developed a class of CNN
∞, o.w.
architecture called Deep Convolutional GAN (DCGAN), and
Note that (17) injects the a priori knowledge due to the empirically showed that using a DCGAN for the generator and
trained GAN into the estimator, which means that among the discriminator significantly alleviates the instability problems in
many possible candidates, the estimate is the one that can be training.
produced by the GAN3 . In our DCGAN model, the first layer of the generator
network processes the low dimensional vector with a fully
C. Channel Estimation and Image Reconstruction connected layer and ReLU activation function, and then
Differences for GANs reshapes it into a 3-dimensional vector. This layer is then
followed by four hidden layers, each of which is composed
GANs have already been used to solve inverse problems in
of upsampling, 2-dimensional convolution with 4 × 4 filters
image processing [29]. This raises the natural question of what
and 1 × 1 stride, batch normalization and ReLU activation
is novel about using a GAN for channel estimation. The first
function. It is worth noting that upsampling repeats the rows
answer is that the structure of the measurement matrix A in
and columns so as to have the same dimensions with the
(12) is very different for the two applications. Furthermore, the
channel matrix at the generator output. The output layer only
signal structures of natural images and channels are distinct.
involves a 2-dimensional convolution with a linear activation
To illustrate, in Fig. 2 we visualize a sample image from
function. The discriminator network of our DCGAN has a
3 We note that producible means the channel is either on the range space of 2-dimensional convolution with 3×3 filters and 2×2 stride for
the generator or close to it, with the meaning of “close” quantified in (22). the input that has leaky ReLU activation function and dropout
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
BALEVI AND ANDREWS: WIDEBAND CHANNEL ESTIMATION WITH A GENERATIVE ADVERSARIAL NETWORK 3053
Fig. 2. Visualisation of (a) a 64 × 16 cropped image in CelebA dataset, (b) a channel realization for 64 transmit and 16 receive antennas from geometric
channel model, and (c) a channel realization for 64 subcarriers and 16 OFDM symbols from TDL-E model.
with 0.25. This is followed by three hidden layers, each of divergence with Wasserstein distance in training, [36] proposes
which has 2-dimensional convolution with 3 × 3 filters and to replace (18) with
2 × 2 stride, batch normalization and leaky ReLU activation
min max Eh [Dθ̂d (h)] − Ez [Dθ̂d (Gθ̂g (z))]. (19)
function, and dropout with 0.25. The slope of the leak is 0.2 G D
for all leaky ReLU functions. The output has a fully connected The GANs whose generator and discriminator are trained
layer, and its dimension is 1 to classify the channel realizations according to (19) is known as a Wasserstein GAN. Note that for
as either valid or fake. a Wasserstein GAN, the Lipschitz condition must be satisfied,
Consecutively repeating the process of training Dθ̂d (·) k and we ensured this with weight clipping. We summarize
times and training Gθ̂g (·) one time to keep the discriminator the overall process of training a Wasserstein GAN according
near its optimal solution theoretically achieves the global to (19) in Algorithm 1, pointing the interested readers to [36]
optimum point of for more details.
min max Eh [log Dθ̂d (h)]+Ez [log(1−Dθ̂d (Gθ̂g (z)))], (18)
G D B. Frequency Selective Channel Estimator
where h represents the training channel samples and z is To solve the channel estimation problem in (16), we first
the generator input sampled from a distribution P(z). The train a Wasserstein GAN offline, then extract its generator net-
loss function in (18) – which corresponds to minimizing work Gθ̂g and iteratively optimize the input of this generative
Jensen–Shannon (JS) divergence between the generator dis- network online. This entire process is presented in Fig. 3 and
tribution and the empirical distribution of the training chan- detailed next.
nel samples – has been used since the seminal paper [10]. 1) Offline Phase: We envision that there will be datasets
However, it is practically delicate to train a GAN with (18), in future standards for different environments e.g., for urban
since the optimum value of k is not known, and the theoretical macro (UMa), urban micro (UMi), indoor-open office just like
guarantee is valid only for infinite capacity neural networks. the existing empirical channel models, and the neural network
To avoid these issues, based on the idea of replacing JS parameters are trained with these datasets. However, since
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3054 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021
Algorithm 1 Wasserstein GAN for Generating Channels Algorithm 2 GAN-Based Channel Estimation
1: while θ̂g and θ̂d have not converged do Input: Gaussian i.i.d. noise z
2: for t = 1, 2, · · · , k do Output: ĥGAN
Sample {hi }m Offline Phase:
3: i=1 , a batch from the real channel realiza-
tions 1: Train the Wasserstein GAN as explained in Algorithm 1
Sample {zi }m 2: Extract the trained generator Gθ̂
4: i=1 ∼ P(z), a batch from the input prior g
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
BALEVI AND ANDREWS: WIDEBAND CHANNEL ESTIMATION WITH A GENERATIVE ADVERSARIAL NETWORK 3055
although the noise power or ||w||22 goes to 0 when SNR goes define the distribution Gθ̂g . We further assume that the GAN
to infinity. This leads to residual error even if is negligible, parameters would become θg leading to the distribution Gθg if
which becomes more prominent for high SNRs. On the other (on) (on)
it was trained with Ncl and Nray , the parameters of (24) for
hand, if there is a perfectly trained generative network, which the online channel. We next show that these two distributions
can produce all the samples in a channel distribution, then this belong to the same distribution family.
residual error becomes 0. Lemma 1: Gθ̂g and Gθg each have a sub-Gaussian
distribution.
IV. G ENERALIZATION C APABILITY
Proof: Assume that αi,l ’s in (24) are independent random
The training channel samples used for the GAN and the test variables, whose support set is between [−Bi,l , Bi,l ] and
channels to measure the performance of the proposed estimator the GAN is perfectly trained so that it learns the channel
must ideally have the same structure. However, the structures distribution. Then, using the Hoeffding’s inequality [37] given
in wireless channels can constantly change depending on in (40) for (24), the tail of Gθ̂g is specified as
various factors. This implies that the test channels can have ⎛ ⎞
different structures, and the ability to gracefully handle novel 2
t
data or the generalization capability of our estimator has to P(Gθ̂g ≥ t) ≤ exp ⎝− (off) (off) ⎠ (25)
Ncl Nray 2
be investigated. This, however, is an involved subject. In 2 i=1 l=1 ||a ||
i,l F
particular, determining the channel conditions that require
retraining is a complicated problem, since this also depends where ai,l = NrγNt ar (φri,l , θi,l
r
)at (φti,l , θi,l
t H
) . Similarly, the
on the GAN architecture, i.e., some architectures can be tail of Gθg becomes
⎛ ⎞
more robust to the distributional changes5. Thus, we start by 2
t
studying the relatively simple case, namely the number of P(Gθg ≥ t) ≤ exp ⎝− (on) (on) ⎠ . (26)
clusters and rays, which are a linear function of the channel. Ncl Nray 2
2 i=1 l=1 ||a ||
i,l F
The proposed method in Fig. 3 searches the best estimate
in the channel manifold acquired with the trained generative From (25) and (26), Gθ̂g and Gθg are sub-Gaussian.
network of the GAN, i.e., Lemma 1 is used to show that the gradient of the generator
√ input parameters with respect to the measurement error
zn+1 = zn − μn ∇zn ||y − ρAGθ̂g (zn )||22 , (23) √
Jn (θg , θ̂g ) = ||y − ρAGθ̂g (zn )||22 . (27)
where μn is the step size at the nth iteration. For the ease of
is unbiased even if the statistics of the test channel samples
analysis, we assume that the parameters θ̂g are trained offline
differ from the training samples in terms of the number of
with the samples that come from
clusters and rays. To make this point clear, let h be in the
N
(off) (off)
Nray range space of the hypothetically online trained GAN for some
Nr Nt cl
Hk = αi,l ar (φri,l , θi,l
r
)at (φti,l , θi,l
t H
) input vector z. This means that the received signal in (12) can
γ i=1 be written as
l=1
(24) √
y = ρAGθg (z) + w. (28)
(off)
for k = 0, 1, · · · , Nf − 1 where Nclis the number of Thus, differentiating (27) with respect to the generative input
(off)
clusters, each of which has Nray paths. Here, αi,l is the yields
complex path gain of the lth ray in the ith cluster, and √
φri,l , θi,l
r
, φti,l , θi,l
t
are the azimuth and elevation angles of ∇zn Jn (θg , θ̂g ) = ∇zn yH y − ∇zn 2 ρyH AGθ̂g (zn )
arrival and departure respectively. For notation Lc −1simplicity, + ∇zn ρGθ̂g (zn )H AH AGθ̂g (zn ). (29)
we assume that the impact of pulse shape b=0 p(bTs −
− j2πkb Since both Gθ̂g and Gθg are sub-Gaussian as shown in
τi,l )e Nf
, where Ts is the symbol period and τi,l is the
corresponding delay, is included within αi,l . The vectors Lemma 1, the RHS of (25) and (26) converge to a Dirac delta
ar (φri,l , θi,l
r
) and at (φti,l , θi,l
t
) are the normalized receive and function δ(t) when γ → ∞, and hence for γ → ∞ we can
transmit antenna array response, which cover the relative angle observe that
of arrival and departure shift of each ray. Also, γ is the √
E ∇zn Jn (θg , θ̂g ) = ∇zn ||y − ρAGθg (zn )||22 . (30)
normalization constant to ensure that E[||Hk ||2F ] = Nr Nt .
To understand the generalization capability, we need to Although these are limiting results for very high SNR
perceive the impact of the channel statistics on (23). In this (γ → ∞) and i.i.d. channel taps, our empirical results in
regards, we assume that the online channel has a different Section V-C indicate a similar behavior for practical SNRs
number of clusters and rays than the offline samples, for and channel models.
which the GAN was trained. More precisely, we assume that The importance of (30) is related to our empirical observa-
(off) (off)
the GAN is offline trained with Ncl and Nray number tions in Sect. V.C that having mismatched numbers of clusters
of clusters and rays, and this yields the parameters θ̂g that and rays between online and offline phases does not appear
5 To avoid online retraining, receivers can also store different set of GAN
to degrade the performance. This can be explained with the
parameters after identifying the scenarios that require retraining, and pick the policy gradient concept in reinforcement learning, since we
most appropriate set of parameters. can model our sequential decision making in the z-space
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3056 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
BALEVI AND ANDREWS: WIDEBAND CHANNEL ESTIMATION WITH A GENERATIVE ADVERSARIAL NETWORK 3057
Fig. 5. The comparison of the GAN-based channel estimation for hybrid architecture with the LS and LMMSE estimators for fully digital architecture, and
a supervised learning method.
(i) the standard LS and the near-optimum LMMSE6 channel is not as efficient as at low SNRs, this is not a problem,
estimators that operate on fully digital transceivers; and (ii) a since the probability of observing moderate or high SNRs is
supervised learning method, which maps y to the channel extremely low due to the high propagation losses and the lack
estimate ĥSL through a standard ResNet model [41] after of beamforming gain.
being trained with 5000 pairs of {y, h}. The only differences Another important point regarding Fig. 5 is that our
between our approach and [41] are the linear activation func- GAN-based estimator outperforms the supervised learning
tion at the output layer and the input and output dimensions: model even if there is plenty of labeled data and a powerful
our input is the stacked real and imaginary parts of y, and the CNN architecture. Specifically, to make a fair comparison, we
output is ĥSL . The performance metric is the NMSE trained the GAN and ResNet with 5000 clean channel samples
via an RMSprop optimizer with a learning rate of 0.00005 and
||h − ĥ||22 a batch size of 200, and use the same number of pilots. Since
NMSE = E , (32)
||h||22 the ResNet architecture has many more parameters than the
in which the expected value is over the underlying probability DCGAN architecture (33 million vs. 2 million), we trained
distribution of the channel h, and ĥ refers to either ĥLMMSE , the GAN with 3000 epochs and measure the elapsed time
ĥLS , ĥGAN defined in (14), (15), (21) or ĥSL . for channel estimation. Then, we find the number of epochs
Considering the worse performance of LS and high com- that corresponds to this amount of time for the ResNet-based
plexity of LMMSE stemming from matrix inversion and chan- channel estimator, and plot the results accordingly. We note
nel covariance matrix estimation, the GAN estimator seems that even if the ResNet is trained with 3000 epochs we
intriguing for high frequency channel estimation as can be empirically observed that its performance is still considerably
seen in Fig. 5(a) and 5(b). Promisingly, our estimator can worse than the GAN for low delay spread. On the other hand,
tackle the negative impacts of the hybrid model and achieve for high delay spread 3000 epochs brings the ResNet-based
a close performance with respect to fully digital transceivers. estimator close to the GAN performance. This implies that
Specifically, the proposed design results in very low error at CNNs trained with conventional loss functions are not as
low SNRs, which is the only feasible regime for mmWave powerful as a GAN in exploiting the channel correlations as
and THz channel estimation. To illustrate, we can achieve can be observed in Fig. 5(a) and 5(b).
the fully digital transceiver performance of the LS estimator For mmWave and THz communication, sending orthogonal
at 20 dB SNR and LMMSE estimator at 2.5 dB SBR with pilots for each coherence bandwidth and time at each transmit
our estimator at −5 dB SNR when the delay spread is antenna leads to an excessive number of pilots due to the large
10 ns. Note that although LMMSE estimator is optimum number of antennas and bandwidth. In the next simulation,
for Gaussian noise when the channel covariance matrix is we address this problem by observing the estimation error
known, our performance is better than LMMSE for low SNRs, of the GAN-based framework when the number of pilots is
because of the estimation errors for channel covariance matrix. drastically reduced. Realizing that the lack of pilots creates an
Furthermore, the residual error term in our estimator due to ill-posed problem, and thus the LS and LMMSE estimators
the representation and optimization error, which are explained are undefined, one has little choice but to use compressed
in Section III-C and becomes more prominent at high SNRs, sensing algorithms. Thus, we use the generic OMP algorithm
results in almost flat NMSE after some SNR. When the delay as a benchmark after sparsifying the channel with a DFT
spread is increased to 100 ns, the performance of our estimator basis as opposed to the GAN algorithm that uses the original
slightly decreases, since this yields lower correlations and (unsparsified) channels. The OMP algorithm has a better
hence less structures. Although our estimator at high SNRs performance-complexity tradeoff with respect to LASSO and
does not require to know the channel distribution as opposed
6 Near-optimum LMMSE refers to the case when the noise is Gaussian, but to message-passing algorithms [28]. However, as compared to
the channel covariance matrix is estimated, i.e., not known. the OMP algorithm, there is a clear performance gain of the
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3058 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021
Fig. 6. The performance of the proposed estimator and OMP with respect to
η, where η is the ratio of the number of used pilots to the full pilot case, which Fig. 7. The generalization capability of the GAN for different number of
corresponds to using an orthogonal pilot tone for each coherence bandwidth clusters when it was trained with 20 clusters.
and time at each transmit antenna.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
BALEVI AND ANDREWS: WIDEBAND CHANNEL ESTIMATION WITH A GENERATIVE ADVERSARIAL NETWORK 3059
√ √ results in
∇zn ||y − ρAGθ̂g (zn )||22 = −2 ρ∇zn Gθ̂g (zn )T AT
√ Ã = wRF PT FBB [n]T FTRF . (36)
(y − ρAGθ̂g (zn )), (33)
(sub)
Notice that P(Ã) = P(Ã ), since P FBB [n] T T
FTRF is
the output of the generative network is multiplied with the
repeated for each element of WRF
H
.
measurement block diagonal matrix A that has Np blocks,
The (m, n)th element of à can then be written as
and this yields a complexity of O(Np Nf2 ), which is the same
√ NtRF Ns
with multiplying AT with (y − ρAGθ̂g (zn )). The backward
propagation complexity for ∇zn Gθ̂g (zn ) has nearly twice the ãm,n = wRF T
p f T
f˜T
m,i i,j j,n (37)
execution time of the forward propagation in CNNs [42], and j=1 i=1
the computational complexity of the second step only covers where pTm,i is the (m, i)th element of PT , fi,j
T
is the (i, j)th
the forward propagation of the first step. Thus, the overall ˜
element of FBB [n] and fj,n is the (j, n)th element of FTRF .
T T
complexity becomes O(Nt Nr Np Nf2 ). This is much less than
Due to the constant modulus constraint |wRF |2 = 1/Nr and
the LS, LMMSE and OMP estimators. Specifically, the LS
and LMMSE estimators require the inversion of AH A−1 as NtRF Ns
||FBB [n]||2F
T ˜T 2
can be seen in (14) and (15), and this yields a complexity of |fi,j fj,n | = . (38)
Nt
O(Nt3 Nr3 Np Nf3 ). The complexity of the OMP is related with j=1 i=1
the target signal dimension h and it is O(Nt3 Nr3 Nf3 ) [32]. Since there is a total transmission power constraint and Frobe-
nius norm is submultiplicative, ||FRF FBB [n]||2F = Ns ≤
VI. C ONCLUSION ||FRF ||2F ||FBB [n]||2F . Hence,
This paper demonstrates how to leverage GANs for effective NtRF Ns
frequency selective channel estimation in a mmWave or a THz T ˜T 2 Ns
|fi,j fj,n | ≥ . (39)
channel: that is, a low SNR channel with a large number j=1 i=1
Nt NtRF
of spatial elements. Promisingly, the proposed GAN-based
According to Hoeffding’s inequality, for zero mean bounded
channel estimation works well for a hybrid architecture with
i.i.d. random variables X1 , · · · XN and b = (b1 , · · · , bN ) ∈
one-bit quantized phase angles, and can even outperform
RN , we have
estimators designed for fully digital receivers. Additionally, N
the proposed estimator enables a substantial reduction in the t2
P Xi bi ≥ t ≤ exp − (40)
number of pilot tones. Regarding its generalization capability,
i=1
2||b||22
we demonstrate that changes in the number of clusters and
rays in multipath channels can be inherently handled without for t ≥ 0. Replacing Xi with pTm,i and bi with
retraining the generative network. As future work, the impact NtRF T ˜T
wRF j=1 fi,j fj,n in (37) yields
of some other channel parameters such as power delay spread, 2
Doppler spread and the angles of arrival and departure can be t Nt NtRF Nr
P (ãm,n ≥ t) ≤ exp − . (41)
analyzed. Furthermore, using a different generative model like 2Ns
a VAE instead of a GAN can be another extension. Thus, ãm,n has a sub-Gaussian distribution. Since (41) holds
for all m = 1, 2, · · · , Np and n = 1, 2, · · · , Nt , this completes
A PPENDIX the proof.
P ROOF OF T HEOREM 1
The measurement matrix due to a single subcarrier becomes R EFERENCES
[1] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estima-
A(sub) = INp ⊗ (p[n]T FBB [n]T FTRF ⊗ WRF
H
), (34) tion and hybrid precoding for millimeter wave cellular systems,” IEEE
J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 831–846, Oct. 2014.
in which the frequency index k is dropped for simplicity. As [2] A. Alkhateeb, G. Leus, and R. W. Heath, “Compressed sensing based
can be observed, the statistics of A in (12) are the same as multi-user millimeter wave systems: How many measurements are
needed?” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
A(sub) in (34). Thus, without any loss of generality we find the (ICASSP), Apr. 2015, pp. 2909–2913.
distribution of A(sub) . Stacking the block diagonal matrices of [3] R. Mendez-Rial, C. Rusu, N. Gonzalez-Prelcic, A. Alkhateeb, and
(34) yields a more compact expression R. W. Heath, “Hybrid MIMO architectures for millimeter wave commu-
nications: Phase shifters or switches?” IEEE Access, vol. 4, pp. 247–267,
Ã(sub) = PT FBB [n]T FTRF ⊗ WRF
H
(35) Jan. 2016.
[4] P. Schniter and A. Sayeed, “Channel estimation and precoder design
where for millimeter-wave communications: The sparse way,” in Proc. 48th
⎛ ⎞ Asilomar Conf. Signals, Syst. Comput., Nov. 2014, pp. 273–277.
p[n]T [5] Z. Gao, C. Hu, L. Dai, and Z. Wang, “Channel estimation for millimeter-
⎜ .. ⎟ wave massive MIMO with hybrid precoding over frequency-selective
PT = ⎝ . ⎠. fading channels,” IEEE Commun. Lett., vol. 20, no. 6, pp. 1259–1262,
Jun. 2016.
p[n + Np − 1]T [6] K. Venugopal, A. Alkhateeb, N. Gonzalez Prelcic, and R. W. Heath,
Since it is not practical to change FRF and WRF for each “Channel estimation for hybrid architecture-based wideband millime-
ter wave systems,” IEEE J. Sel. Areas Commun., vol. 35, no. 9,
channel realization, we assume that they are given and fixed. pp. 1996–2009, Sep. 2017.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.
3060 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 20, NO. 5, MAY 2021
[7] J. Mo, P. Schniter, and R. W. Heath, “Channel estimation in broadband [32] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixture
millimeter wave MIMO systems with few-bit ADCs,” IEEE Trans. approximate message passing,” IEEE Trans. Signal Process., vol. 61,
Signal Process., vol. 66, no. 5, pp. 1141–1154, Mar. 2018. no. 19, pp. 4658–4672, Oct. 2013.
[8] Y. Li, “Simplified channel estimation for OFDM systems with multiple [33] C. Doersch, “Tutorial on variational autoencoders,” 2016,
transmit antennas,” IEEE Trans. Wireless Commun., vol. 1, no. 1, arXiv:1606.05908. [Online]. Available: http://arxiv.org/abs/1606.05908
pp. 67–75, Jan. 2002. [34] L.-H. Chen, C. G. Bampis, Z. Li, A. Norkin, and A. C. Bovik, “Percep-
[9] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in tually optimizing deep image compression,” 2020, arXiv:2007.02711.
Proc. Int. Conf. Learn. Represent. (ICLR), May 2013, pp. 1–14. [Online]. Available: http://arxiv.org/abs/2007.02711
[10] I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural [35] J. Li, J. Jia, and D. Xu, “Unsupervised representation learning of
Inf. Process. Syst., Dec. 2014, pp. 2672–2680. image-based plant disease with deep convolutional generative adver-
[11] T. S. Rappaport et al., “Millimeter wave mobile communications for 5G sarial networks,” in Proc. 37th Chin. Control Conf. (CCC), Jul. 2018,
cellular: It will work!,” IEEE Access, vol. 1, pp. 335–349, May 2013. pp. 9159–9163.
[12] Z. Gao, L. Dai, W. Dai, B. Shim, and Z. Wang, “Structured compressive [36] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adver-
sensing-based spatio-temporal joint channel estimation for FDD massive sarial networks,” in Proc. Int. Conf. Mach. Learn., vol. 70, Aug. 2017,
MIMO,” IEEE Trans. Commun., vol. 64, no. 2, pp. 601–617, Feb. 2016. pp. 214–223.
[37] R. Vershynin, High-Dimensional Probability: An Introduction with
[13] S. L. H. Nguyen and A. Ghrayeb, “Compressive sensing-based chan-
Applications in Data Science. Cambridge, U.K.: Cambridge Univ. Press,
nel estimation for massive multiuser MIMO systems,” in Proc. IEEE
2018.
Wireless Commun. Netw. Conf. (WCNC), Apr. 2013, pp. 2890–2895.
[38] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction,
[14] Z. Gao, L. Dai, Z. Wang, and S. Chen, “Spatially common sparsity based 2nd ed. Cambridge, MA, USA: MIT Press, 2018.
adaptive channel estimation and feedback for FDD massive MIMO,” [39] S. Rangan, T. Rappaport, E. Erkip, Z. Latinovic, M. R. Akdeniz, and
IEEE Trans. Signal Process., vol. 63, no. 23, pp. 6169–6183, Dec. 2015. Y. Liu, “Perceptually optimizing deep image compression,” in Proc.
[15] X. Lin, S. Wu, L. Kuang, Z. Ni, X. Meng, and C. Jiang, “Estimation IEEE Commun. Theory Workshop, Jun. 2013, pp. 1–25.
of sparse massive MIMO-OFDM channels with approximately common [40] 5G; Study on Channel Model for Frequencies From 0.5 to 100 GHz (Rel
support,” IEEE Commun. Lett., vol. 21, no. 5, pp. 1179–1182, May 2017. 14), document 3GPP TS 38.901, 3GPP FTP Server, 2017.
[16] M. Soltani, V. Pourahmadi, A. Mirzaei, and H. Sheikhzadeh, “Deep [41] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
learning-based channel estimation,” IEEE Commun. Lett., vol. 23, no. 4, image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
pp. 652–655, Apr. 2019. (CVPR), Jun. 2016, pp. 770–778.
[17] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel [42] K. He and J. Sun, “Convolutional neural networks at constrained time
estimation for beamspace mmWave massive MIMO systems,” IEEE cost,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
Wireless Commun. Lett., vol. 7, no. 5, pp. 852–855, Oct. 2018. Jun. 2015, pp. 5353–5360.
[18] P. Dong, H. Zhang, G. Ye Li, I. S. Gaspar, and N. NaderiAlizadeh, “Deep
CNN-based channel estimation for mmWave massive MIMO systems,”
IEEE J. Sel. Topics Signal Process., vol. 13, no. 5, pp. 989–1000,
Sep. 2019.
[19] E. Balevi and J. G. Andrews, “Deep learning-based channel estimation
for high-dimensional signals,” 2019, arXiv:1904.09346. [Online]. Avail- Eren Balevi received the B.S., M.S., and Ph.D.
able: http://arxiv.org/abs/1904.09346 degrees in electrical and electronics engineering
[20] E. Balevi, A. Doshi, and J. G. Andrews, “Massive MIMO channel from Middle East Technical University, Ankara,
estimation with an untrained deep neural network,” IEEE Trans. Wireless Turkey, in 2008, 2010, and 2016, respectively. He is
Commun., vol. 19, no. 3, pp. 2079–2090, Mar. 2020. currently a Post-Doctoral Research Scholar with the
[21] Y. Yang, F. Gao, X. Ma, and S. Zhang, “Deep learning-based channel Department of Electrical and Computer Engineering,
estimation for doubly selective fading channels,” IEEE Access, vol. 7, The University of Texas at Austin. His current
pp. 36579–36589, Apr. 2019. research interests include the intersection between
[22] X. Ru, L. Wei, and Y. Xu, “Model-driven channel estimation for machine learning and communication theory. He is
OFDM systems based on image super-resolution network,” 2019, also interested with the general areas of 5G and
arXiv:1911.13106. [Online]. Available: http://arxiv.org/abs/1911.13106 beyond wireless systems, fog/edge networking, and
[23] S. Gao, P. Dong, Z. Pan, and G. Y. Li, “Deep learning based channel molecular communications.
estimation for massive MIMO with mixed-resolution ADCs,” IEEE
Commun. Lett., vol. 23, no. 11, pp. 1989–1993, Nov. 2019.
[24] E. Balevi and J. G. Andrews, “One-bit OFDM receivers via deep learn-
ing,” IEEE Trans. Commun., vol. 67, no. 6, pp. 4326–4336, Jun. 2019.
[25] T. J. O’Shea, T. Roy, and N. West, “Approximating the void: Learning Jeffrey G. Andrews (Fellow, IEEE) received the
stochastic channel models from observation with variational generative B.S. degree (Hons.) in engineering from the Harvey
adversarial networks,” in Proc. Int. Conf. Comput., Netw. Commun. Mudd College, and the M.S. and Ph.D. degrees in
(ICNC), Feb. 2019, pp. 681–686. electrical engineering from Stanford University.
[26] H. Ye, G. Y. Li, B.-H.-F. Juang, and K. Sivanesan, “Channel agnostic He is currently the Cockrell Family Endowed
end-to-end learning based communication systems with conditional Chair of engineering with The University of Texas at
GAN,” in Proc. IEEE Globecom Workshops (GC Wkshps), Dec. 2018, Austin. He developed CDMA systems at Qualcomm,
pp. 1–5. and has served as a consultant to Samsung, Nokia,
[27] A. Doshi, E. Balevi, and J. G. Andrews, “Compressed representation Qualcomm, Apple, Verizon, AT&T, Intel, Microsoft,
of high dimensional channels using deep generative networks,” in Sprint, and NASA. He is the coauthor of the books
Proc. IEEE 21st Int. Workshop Signal Process. Adv. Wireless Commun. Fundamentals of WiMAX (Prentice-Hall, 2007) and
(SPAWC), May 2020, pp. 1–5. Fundamentals of LTE (Prentice-Hall, 2010). He is an ISI Highly Cited
[28] E. Balevi, A. Doshi, A. Jalal, A. Dimakis, and J. G. Andrews, “High Researcher and has been a co-recipient of 15 paper awards, including the
dimensional channel estimation using deep generative networks,” IEEE 2016 IEEE Communications Society & Information Theory Society Joint
J. Sel. Areas Commun., vol. 39, no. 1, pp. 18–30, Jan. 2021. Paper Award, the 2014 IEEE Stephen O. Rice Prize, the 2014 and 2018 IEEE
[29] A. Bora, A. Jalal, E. Price, and A. G. Dimakis, “Compressed sensing Leonard G. Abraham Prize, the 2011 and 2016 IEEE Heinrich Hertz Prize,
using generative models,” in Proc. Int. Conf. Mach. Learn. (ICML), and the 2010 IEEE ComSoc Best Tutorial Paper Award. He received the 2015
Aug. 2017, pp. 1–24. Terman Award, the NSF CAREER Award, and the 2019 IEEE Kiyo Tomiyasu
[30] A. Jalal, L. Liu, A. G. Dimakis, and C. Caramanis, “Robust compressed technical field award. He is also the founding Chair of the Steering Committee
sensing of generative models,” 2020, arXiv:2006.09461. [Online]. Avail- for the IEEE J OURNAL ON S ELECTED A REAS IN I NFORMATION T HEORY,
able: http://arxiv.org/abs/2006.09461 amongst other IEEE leadership roles. He was the Editor-in-Chief of the
[31] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algo- IEEE T RANSACTIONS ON W IRELESS C OMMUNICATIONS from 2014 to 2016
rithms for compressed sensing,” Proc. Nat. Acad. Sci. USA, vol. 106, and the Chair of the IEEE Communications Society Emerging Technologies
no. 45, pp. 18914–18919, Nov. 2009. Committee from 2018 to 2019.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on January 04,2025 at 09:02:39 UTC from IEEE Xplore. Restrictions apply.