0% found this document useful (0 votes)
47 views5 pages

Finite Blocklength Coding For Channels With Side Information at The Receiver

The document summarizes key results about finite blocklength coding for channels with side information at the receiver. It shows that: 1) The random coding and sphere packing error exponent bounds for such a channel can be expressed in terms of the expected error exponent bounds of the underlying channels, with an additional logarithmic term. 2) The dispersion (measure of rate approaching capacity with block length) of the channel is equal to the expected dispersion of the underlying channels plus the variance of their capacities. 3) Both the expected error exponent bounds and expected dispersion overestimate the actual error exponent and dispersion of the channel with side information.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views5 pages

Finite Blocklength Coding For Channels With Side Information at The Receiver

The document summarizes key results about finite blocklength coding for channels with side information at the receiver. It shows that: 1) The random coding and sphere packing error exponent bounds for such a channel can be expressed in terms of the expected error exponent bounds of the underlying channels, with an additional logarithmic term. 2) The dispersion (measure of rate approaching capacity with block length) of the channel is equal to the expected dispersion of the underlying channels plus the variance of their capacities. 3) Both the expected error exponent bounds and expected dispersion overestimate the actual error exponent and dispersion of the channel with side information.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel

Finite Blocklength Coding for Channels with


Side Information at the Receiver
Amir Ingber and Meir Feder
Department of EE-Systems, The Faculty of Engineering
Tel Aviv University, Tel Aviv 69978, ISRAEL
email: {ingber, meir}@eng.tau.ac.il
AbstractThe communication model of a memoryless channel
that depends on a random state that is known at the receiver only
is studied. The channel can be thought of as a set of underlying
channels with a xed state, where at each channel use one of
these channels is chosen at random, and this selection is known
to the receiver only. The capacity of a such channel is known,
and is given by the expectation (w.r.t. the random state) of the
capacity of the underlying channels.
In this work we examine the nite-length characteristics of this
channel, and their relation to the characteristics of the underlying
channels. We derive error exponent bounds (random coding and
sphere packing) for the channel and determine their relation to
the corresponding bounds of the underlying channels. We also
determine the channel dispersion and its relation to the dispersion
of the underlying channels. We show that both in the error
exponent bounds and in the dispersion case, the expectation of
these quantities is too optimistic w.r.t. the actual value. Examples
for such channels are discussed.
I. INTRODUCTION
The communication model of a memoryless channel that
depends on a random state is studied. We focus on the
case where the random state, also known as channel state
information (CSI), is known at the receiver only. The channel,
denoted by W, can be thought of as a set of (memoryless)
channels, W
S
, where S is the random state. Such a model
appears many times in practice: the ergodic fading channel
is an example for such a channel, where the fade value is
assumed to be known at the receiver. Sometimes the state
S is a result of the communication scheme design and is
inserted intentionally (for example, in order to attain symmetry
properties).
In this work we study the relationship between the nite
blocklength information theoretic properties of the channel W
and those of the underlying channels W
S
.
The capacity of this channel is well known, an is gen-
erally given by the expectation (over S) of the capacity of
the underlying channel W
S
. We follow by analyzing other
information theoretical properties such as the error exponent
and the channel dispersion of the channel W and comparing
then to the expected values of these properties of the channel
W
S
.
The main results can be summarized as follows:
The random coding and sphere packing error exponent
bounds [1] are both given by the expression E
0
() R
(optimized w.r.t. ), where E
0
() is a function of the channel.
We show that the function E
0
for the channel W is given by
E
0
() = log E
_
2
E
(S)
0
()
_
, (1)
where E
(S)
0
is the corresponding E
0
function for the channel
W
S
, E[] denotes expectation (w.r.t. S), and log = log
2
.
In [2], error exponents for channels with side information
were considered. However, the focus was on channels with
CSI at the transmitter as well, compound channels and more.
While the case of CSI known at the receiver only is as special
case, the contribution here lies in the simplicity of the relation
(1).
We also discuss the channel dispersion (see [3], [4]), which
quanties the speed at which the rate approaches capacity with
the block length (when the codeword error rate is xed). We
show the following relationship between the dispersions of W
and W
S
, denoted V and V
S
respectively:
V = E[V
S
] +VAR[C
S
] . (2)
Both in the error exponent and in the dispersion case, we
show that the expected exponent and the expected dispersion
are too optimistic w.r.t. the actual exponent and dispersion.
Finally, we discuss several examples that involve chan-
nels with side information at the receiver, such as channel
symmetrization, multilevel codes with multi-stage decoding
(MLC-MSD) and bit-interleaved coded modulation (BICM).
II. THE GENERAL COMMUNICATION MODEL
A. Channel Model
Let W be a discrete memoryless channel (DMC)
1
with
input x X, and output (y, s) Y S, where s S is the
channel state, which is independent of the channel input X:
W(y, s|x) = P
Y,S|X
(y, s|x) = P
S
(s)P
Y |S,X
(y|s, x). (3)
Denition 1: Let W
s
be the W where the state S is xed
to s:
W
s
(y|x) P
Y |S,X
(y|s, x). (4)
1
Similar results for continuous-output channels can be derived similarly.
978-1-4244-8682-3/10/$26.00 c 2010 IEEE
000798
B. Communication Scheme
The communication scheme is dened as follows. Let n be
the codeword length, and let M be a set of 2
nR
messages.
The encoder and decoder are denoted f
n
and g
n
respectively,
where
f
n
: M X
n
is the encoder, which maps the input
message m to the channel input x X
n
.
g
n
: Y
n
S
n
M is the decoder, which maps the
channel output and the channel state to an estimate m of
the transmitted message.
The considered error probability is the codeword error
probability p
e
P( m = m), where the messages m are
drawn randomly from M with equal probability.
The communication scheme is depicted in Figure 1.
We shall be interested in the tradeoff between rate R,
codelength n and error probability p
e
of the best possible
codes.
III. INFORMATION THEORETIC ANALYSIS
Here we shall be interested in the performance of the
optimal codes for the channel W. We review known results
for the capacity, and present the results for the error exponent
and the channel dispersion.
A. Capacity
Since the channel W is simply a DMC with a scalar input
and a vector output, the capacity can be simply derived (see,
e.g. [5]):
C(W) = max
p(x)
I(X; Y, S)
= max
p(x)
I(X; Y |S) +I(X; S)
= max
p(x)
I(X; Y |S), (5)
where the last equality holds since X and S are independent.
Note that the capacity can also be written as
max
p(x)
E
S
[I(X; Y |S = s)]. (6)
In the paper we limit ourselves to a xed input distribution
(e.g. equiprobable). In this case the capacity is given by
I(X; Y |S) = E
S
[I(X; Y |S = s)]. (7)
Recalling the denition of the channel conditioned on the state
s, we get
C(W) = I(X; Y |S)
= E
S
[I(X; Y |S = s)]
= E[C(W
S
)], (8)
where C(W
s
) is the capacity of the underlying channel W
s
.
We conclude that the capacity formula can be interpreted as
an expectation over the capacities of the underlying channels.
Note that when the CSI is available at the transmitter as
well, (8) holds even without the assumption of a xed prior
on X.
B. Error Exponent
The error exponent of a channel is given by [1]
E(R) lim
n

1
n
log (p
e
(n, R)) , (9)
where p
e
(n, R) is the average codeword error probability for
the best code of length n and rate R (assuming that the limit
exists).
While the exact characterization of the error exponent is
still an open question, two important bounds are known [1]:
the random coding and the sphere packing error exponents,
which are a lower and upper bounds, respectively.
The random coding exponent is given by
E
r
(R) = max
[0,1]
max
pX()
{E
0
(, P
X
) R}, (10)
where E
0
(, P
X
) is given by
log

yY
_

xX
P
X
(x)W(y|x)
1
1+
_
1+

. (11)
The sphere packing bound E
sp
(R) is given by
E
sp
(R) = max
>0
max
pX()
{E
0
(, p) R}. (12)
It can be seen that both exponent bounds are similar. In fact,
they only differ in the optimization region of the parameter ,
and they coincide at rates beyond a certain rate called the
critical rate.
We note that both bounds depend on the function E
0
(). For
channels with CSI at the receiver, we derive E
0
() explicitly.
Following the relationship (8), we wish to nd the connections
between E
0
() and the corresponding E
0
functions of the
conditional channels W
s
, denoted E
(s)
0
.
Theorem 1: Let W be a channel with CSI at the receiver.
Then the function E
0
() for this channel is given by
E
0
(, P
X
) = log E
_
2
E
(S)
0
(,PX)
_
. (13)
Proof: When the channel output is (y, s), we get
E
0
(, P
X
) =
= log

yY,sS
_

xX
P
X
(x)W(y, s|x)
1
1+
_
1+

= log
_

sS
P
S
(s)

yY
_

xX
P
X
(x)P
Y |S,X
(y|s, x)
1
1+
_
1+

(a)
= log
_

sS
P
S
(s)2
E
(s)
0
(,PX)
_
= log E
_
2
E
(S)
0
(,PX)
_
,
where (a) follows from the denition of E
(s)
0
.
000799
x X
n
y Y
n
W mM m
s S
n
Random
state
Encoder
Decoder
Fig. 1. Communication scheme for channels with CSI at the receiver
As a corollary, we get the random coding and the sphere
packing exponents for the channel W according to (10) and
(12).
Following (8), one might think that the error exponent
bounds (for example, E
r
(R)) will be given by the expectation
of the exponent function w.r.t. S. This is clearly not the case,
as seen in Theorem 1. In addition, the following can be shown:
Theorem 2: Let

E
r
(R) be the average of E
(S)
r
w.r.t. S:

E
r
(R) E
_
E
(S)
r
(R)
_
. (14)
Then

E
r
(R) always overestimates the true random coding
exponent of W, E
r
(R).
Proof: Let

E
0
(, P
X
) = E
_
E
(S)
0
(, P
X
)
_
. Since 2
()
is
convex, it follows by the Jensen inequality and Theorem 1 that
E
0
(, P
X
)

E
0
(, P
X
). (15)
We continue with

E
r
(R):

E
r
(R) = E
_
E
(S)
r
(R)
_
= E
_
sup
PX; [0,1]
_
E
(S)
0
(, P
X
) R
_
_
sup
PX; [0,1]
E
_
E
(S)
0
(, P
X
) R
_
= sup
PX; [0,1]
_

E
0
(, P
X
) R
_
(a)
sup
PX; [0,1]
[E
0
(, P
X
) R]
= E
r
(R), (16)
where (a) follow from (15).
Note that the proof of Theorem 2 holds no matter what the
optimization region of is. Therefore the same version for
the sphere packing exponent follows similarly. We conclude
that the expectation (w.r.t. S) of the error exponent bounds
overestimate the true exponent bounds of W (and also the
true error exponent, over the critical rate).
C. Dispersion
An alternative information theoretical measure for quantify-
ing coding performance with nite block lengths is the channel
dispersion. Suppose that a xed codeword error probability
p
e
and a codeword length n are given. We can then seek
the maximal achievable rate R given p
e
and n. It appears
that for xed p
e
and n, the gap to the channel capacity is
approximately proportional to Q
1
(p
e
)/

n (where Q() is
the complementary Gaussian cumulative distribution function).
The proportion constant (squared) is called the channel disper-
sion. Formally, dene the (operational) channel dispersion as
follows [3]:
Denition 2: The dispersion V(W) of a channel W with
capacity C is dened as
V(W) = lim
pe0
limsup
n
n
_
C R(n, p
e
)
Q
1
(p
e
)
_
2
, (17)
where R(n, p
e
) is the highest achievable rate for codeword
error probability p
e
and codeword length n.
In 1962 , Strassen [4] used the Gaussian approximation to
derive the following result for DMCs:
R(n, p
e
) = C
_
V
n
Q
1
(p
e
) +O
_
log n
n
_
, (18)
where C is the channel capacity, and the new quantity V is
the (information-theoretic) dispersion, which is given by
V VAR(i(X; Y )), (19)
where i(x; y) is the information spectrum, given by
i(x; y) log
P
XY
(x, y)
P
X
(x)P
Y
(y)
, (20)
and the distribution of X is the capacity-achieving distri-
bution that minimizes V . Strassens result proves that the
dispersion of DMCs is equal to VAR(i(X; Y )). This result
was recently tightened (and extended to the power-constrained
AWGN channel) in [3]. It is also known that the channel
dispersion and the error exponent are related as follows. For a
channel with capacity C and dispersion V , the error exponent
can be approximated (for rates close to the capacity) by
E(R)

=
(CR)
2
2V ln 2
. See [3] for details on the early origins of
this approximation by Shannon.
We now explore the dispersion for the case of channels with
side information at the receiver.
Theorem 3: The dispersion of the channel W with CSI at
the receiver is given by
V(W) = E[V(W
S
)] +VAR[C(W
S
)] , (21)
000800
where both expectation and variance are taken w.r.t. the
random state S.
Proof: Since W is nothing but a DMC with a vec-
tor output, the proof boils down to the calculation of
VAR[i(X; (Y, S))]. The information spectrum in this case is
given by
i(x; y, s) = log
P
Y SX
(y, s, x)
P
Y S
(y, s)P
X
(x)
(a)
= log
P
Y |S,X
(y|s, x)
P
Y |S
(y|s)
i(x; y|s), (22)
where (a) follows since X and S are independent.
Suppose that s is xed, i.e. consider the channel W
s
. The
capacity is given by
C(W
s
) = E[i(X; Y |S)|S = s]
= I(X; Y |S = s). (23)
The dispersion of the channel W
s
is given by
V(W
s
) = VAR(i(X; Y |S)|S = s)
= E
_
i
2
(X; Y |S)|S = s

E
2
[i(X; Y |S)|S = s]
= E
_
i
2
(X; Y |S)|S = s

C(W
s
)
2
. (24)
Finally, the dispersion of the original channel W is given
as follows:
V(W) = VAR(i(X; Y |S))
(a)
= E[VAR[i(X; Y |S)|S = s]]
+VAR[E[i(X; Y |S)|S = s]]
= E[V(W
S
)] +VAR[C(W
S
)] , (25)
where (a) follows from the law of total variance.
A few notes regarding this result:
Let

V(W) E[V(W
S
)]. As an immediate corollary of
Theorem 3, it can be seen that

V(W) underestimates the
true dispersion of W, V(W). This fact ts the exponent
case: both expected exponent and expected dispersion are
too optimistic w.r.t. the true exponent and dispersion.
The factor VAR[C(W
S
)] can be viewed as a penalty
factor over the expected dispersion

V(W), that grows as
the capacities of the underlying channels are more spread.
IV. CODE DESIGN
When designing channel codes, the fact that the output is
two-dimensional may complicate the code design. It would
therefore be of interest to apply some processing on the outputs
Y and S, and feed them to the decoder as a single value. We
seek such a processing method that would not compromise
the achievable performance over the modied channel (not
only in the capacity sense, but in the error probability at nite
codelengths sense as well).
For binary channels this can be done easily by calculating
the log-likelihood-ratios for each channel output pair (y, s)
(see Figure 2).
For channel outputs s and y, denote the LLR of x given
(y, s) by z:
z = LLR(y, s) log
P
Y |S,X
(y|s, x = 0)
P
Y |S,X
(y|s, x = 1)
. (26)
It is well known that for channels with binary input, the
optimal ML decoder can be implemented to work on the LLR
values only. Therefore by plugging the LLR calculator at the
channel output, and supplying the decoder with the LLRs only,
the performance is not harmed, and we can therefore regard
the channel as a simple DMC with input x and output z =
LLR(y, s) for code design purposes.
V. EXAMPLES
A. Symmetrization of binary channels with equiprobable input
In the design of state-of-the-art channel codes, it is usually
convenient to have channels that are symmetric. In recent years
there have developed methods to design very efcient binary
codes, such as LDPC codes. When designing LDPC codes, A
desired property of a binary channel is that its output will be
symmetric[6].
Denition 3 (Binary input, output symmetric channels [6]):
A memoryless binary channel U with input alphabet {0, 1}
and output alphabet R is called output-symmetric, if for all
y R
U(y|0) = U(y|1). (27)
Consider a general binary channel W with arbitrary out-
put (which is not necessarily symmetric), and suppose that,
for practical reasons, we are interested in coding over this
channel with equiprobable input (which may or may not be
the achieving prior for that channel). The fact that we use
equiprobable input does not make the channel symmetric
according to Denition 3. However, there exists a method
for transforming this channel to a symmetric one, without
compromising on the capacity, error exponent or dispersion:
First, we add the LLR calculation to the channel and regard it
as a part of the channel. This way we get a real-output channel
from any arbitrary channel. Second, before we transmit the
binary codewords on the channel, instead of coding on the
channel directly, we perform a bit-wise XOR operation with
an iid pseudo-random binary vector. It can be shown that by
multiplying the LLR values by 1 wherever the input was
ipped, the LLRs are corrected. It can also be shown that
the channel, with the corrected LLR calculation is symmetric
according to Denition 3. In [7], this method (termed channel
adapters) was used in order to symmetrize the sub-channels of
several coded modulation schemes. It is also shown in [7] that
the capacity is unchanged by the channel adapters. By using
Theorems 1 and 3, it can be veried that the error exponent
bounds and the dispersion remain the same as well.
B. Multilevel Coding and Multistage Decoding (MLC-MSD)
MLC-MSD is a method for using binary codes in order
to achieve capacity on nonbinary channels (see, e.g. [8]). In
000801
W m m
z R
n
Random
state
Encoder Decoder
LLR
calc.
Fig. 2. Incorporating LLR calculation into the channel
MLC-MSD, the binary encoders work in parallel over the same
block of channel uses, and the decoders work sequentially as
follows: the rst decoder assumes the rest of the codewords
are noise and decodes the message from the rst encoder.
Every other decoder, in its turn, decodes the message from the
corresponding encoder assuming that the decoded messages
from the previous decoders are correct, therefore regards these
messages as side-information. The effective channels between
each encoder-decoder, called sub-channels, are in fact channels
with CSI at the receiver, and therefore can be analyzed by
Theorems 1 and 3. For more details on nite-length analysis
of MLC-MSD, see [9].
C. Bit-Interleaved Coded Modulation (BICM)
BICM [10] is another popular method for channel coding
using binary codes over nonbinary channels (for example,
a channel with output of size 2
L
). It is based on taking a
single binary code, feeding it into a long interleaver, and
then mapping the interleaved coded bits onto the nonbinary
channel alphabet (every L-tuple of consecutive bits is mapped
to a symbol in the channel input alphabet of size 2
L
). At the
receiver, the LLR of all coded bits are calculated according to
the mapping, de-interleaved and fed to the decoder.
By assuming that the interleaver is ideal (i.e. of innite
length), the equivalent channel of BICM is modeled as a
binary channel with a random state [10]. The state is chosen
uniformly from {1, ..., L}, and represents the index of the
input bit in the L-tuple. Since the state is known to the receiver
only, this model ts the channel models discussed in the paper.
Finite blocklength analysis of BICM should be done care-
fully: although the model of a binary channel with a state
known at the receiver allows the derivation of error exponent
and channel dispersion, they do not have the usual meaning of
quantifying the performance of BICM at nite block lengths.
The reason for that is the interleaver: how can one rely on the
existence of an innite-length interleaver in order to estimate
the nite-length performance?
The solution comes in the form of an explicit nite-length
interleaver. Recently an alternative scheme called Parallel
BICM was proposed [11], where binary codewords are used
in parallel and an interleaver of nite length is used in order
to validate the BICM model of a binary channel with a state
known at the receiver. This allows the proper use of Theorems
1 and 3 (see [11] for the details).
D. Fading Channels
Rayleigh fading channel, which is popular in wireless
communication, can be modeled as a channel with CSI at the
receiver. The state in this setting is the fade value, which is
usually estimated and some version of it is available at the
receiver. When the fading is fast (a.k.a. ergodic fading) the
channel is memoryless and ts the model discussed in the
paper, and Theorems 1 and 3 can be applied.
ACKNOWLEDGMENT
A. Ingber is supported by the Adams Fellowship Program
of the Israel Academy of Sciences and Humanities.
REFERENCES
[1] Robert G. Gallager, Information Theory and Reliable Communication,
John Wiley & Sons, Inc., New York, NY, USA, 1968.
[2] Pierre Moulin and Ying Wang, Capacity and random-coding exponents
for channel coding with side information, IEEE Trans. on Information
Theory, vol. 53, pp. 13261347, 2007.
[3] Y. Polyanskiy, H.V. Poor, and S. Verd u, Channel coding rate in the
nite blocklength regime, IEEE Trans. on Information Theory, vol. 56,
no. 5, pp. 2307 2359, May 2010.
[4] V. Strassen, Asymptotische absch atzungen in shannons informa-
tionstheorie, Trans. Third Prague Conf. Information Theory, 1962,
Czechoslovak Academy of Sciences, pp. 689723.
[5] Thomas M. Cover and Joy A. Thomas, Elements of Information Theory,
John Wiley & sons, 1991.
[6] Thomas J. Richardson, Mohammad Amin Shokrollahi, and R udiger L.
Urbanke, Design of capacity-approaching irregular low-density parity-
check codes, IEEE Trans. on Information Theory, vol. 47, no. 2, pp.
619637, 2001.
[7] Jilei Hou, Paul H. Siegel, Laurence B. Milstein, and Henry D. Pster,
Capacity-approaching bandwidth-efcient coded modulation schemes
based on low-density parity-check codes, IEEE Trans. on Information
Theory, vol. 49, no. 9, pp. 21412155, 2003.
[8] Udo Wachsmann, Robert F. H. Fischer, and Johannes B. Huber, Mul-
tilevel codes: Theoretical concepts and practical design rules, IEEE
Trans. on Information Theory, vol. 45, no. 5, pp. 13611391, 1999.
[9] Amir Ingber and Meir Feder, Capacity and error exponent analysis
of multilevel coding with multistage decoding, in Proc. IEEE Inter-
national Symposium on Information Theory, Seoul, South Korea, 2009,
pp. 17991803.
[10] Giuseppe Caire, Giorgio Taricco, and Ezio Biglieri, Bit-interleaved
coded modulation, IEEE Trans. on Information Theory, vol. 44, no. 3,
pp. 927946, 1998.
[11] Amir Ingber and Meir Feder, Parallel bit interleaved coded mod-
ulation, in ALLERTON 2010, 48th Annual Allerton Conference on
Communication, Control and Computing, September 29 - October 1,
2010, Allerton, USA, 09 2010.
000802

You might also like