Finite Blocklength Coding For Channels With Side Information at The Receiver
Finite Blocklength Coding For Channels With Side Information at The Receiver
1
n
log (p
e
(n, R)) , (9)
where p
e
(n, R) is the average codeword error probability for
the best code of length n and rate R (assuming that the limit
exists).
While the exact characterization of the error exponent is
still an open question, two important bounds are known [1]:
the random coding and the sphere packing error exponents,
which are a lower and upper bounds, respectively.
The random coding exponent is given by
E
r
(R) = max
[0,1]
max
pX()
{E
0
(, P
X
) R}, (10)
where E
0
(, P
X
) is given by
log
yY
_
xX
P
X
(x)W(y|x)
1
1+
_
1+
. (11)
The sphere packing bound E
sp
(R) is given by
E
sp
(R) = max
>0
max
pX()
{E
0
(, p) R}. (12)
It can be seen that both exponent bounds are similar. In fact,
they only differ in the optimization region of the parameter ,
and they coincide at rates beyond a certain rate called the
critical rate.
We note that both bounds depend on the function E
0
(). For
channels with CSI at the receiver, we derive E
0
() explicitly.
Following the relationship (8), we wish to nd the connections
between E
0
() and the corresponding E
0
functions of the
conditional channels W
s
, denoted E
(s)
0
.
Theorem 1: Let W be a channel with CSI at the receiver.
Then the function E
0
() for this channel is given by
E
0
(, P
X
) = log E
_
2
E
(S)
0
(,PX)
_
. (13)
Proof: When the channel output is (y, s), we get
E
0
(, P
X
) =
= log
yY,sS
_
xX
P
X
(x)W(y, s|x)
1
1+
_
1+
= log
_
sS
P
S
(s)
yY
_
xX
P
X
(x)P
Y |S,X
(y|s, x)
1
1+
_
1+
(a)
= log
_
sS
P
S
(s)2
E
(s)
0
(,PX)
_
= log E
_
2
E
(S)
0
(,PX)
_
,
where (a) follows from the denition of E
(s)
0
.
000799
x X
n
y Y
n
W mM m
s S
n
Random
state
Encoder
Decoder
Fig. 1. Communication scheme for channels with CSI at the receiver
As a corollary, we get the random coding and the sphere
packing exponents for the channel W according to (10) and
(12).
Following (8), one might think that the error exponent
bounds (for example, E
r
(R)) will be given by the expectation
of the exponent function w.r.t. S. This is clearly not the case,
as seen in Theorem 1. In addition, the following can be shown:
Theorem 2: Let
E
r
(R) be the average of E
(S)
r
w.r.t. S:
E
r
(R) E
_
E
(S)
r
(R)
_
. (14)
Then
E
r
(R) always overestimates the true random coding
exponent of W, E
r
(R).
Proof: Let
E
0
(, P
X
) = E
_
E
(S)
0
(, P
X
)
_
. Since 2
()
is
convex, it follows by the Jensen inequality and Theorem 1 that
E
0
(, P
X
)
E
0
(, P
X
). (15)
We continue with
E
r
(R):
E
r
(R) = E
_
E
(S)
r
(R)
_
= E
_
sup
PX; [0,1]
_
E
(S)
0
(, P
X
) R
_
_
sup
PX; [0,1]
E
_
E
(S)
0
(, P
X
) R
_
= sup
PX; [0,1]
_
E
0
(, P
X
) R
_
(a)
sup
PX; [0,1]
[E
0
(, P
X
) R]
= E
r
(R), (16)
where (a) follow from (15).
Note that the proof of Theorem 2 holds no matter what the
optimization region of is. Therefore the same version for
the sphere packing exponent follows similarly. We conclude
that the expectation (w.r.t. S) of the error exponent bounds
overestimate the true exponent bounds of W (and also the
true error exponent, over the critical rate).
C. Dispersion
An alternative information theoretical measure for quantify-
ing coding performance with nite block lengths is the channel
dispersion. Suppose that a xed codeword error probability
p
e
and a codeword length n are given. We can then seek
the maximal achievable rate R given p
e
and n. It appears
that for xed p
e
and n, the gap to the channel capacity is
approximately proportional to Q
1
(p
e
)/
n (where Q() is
the complementary Gaussian cumulative distribution function).
The proportion constant (squared) is called the channel disper-
sion. Formally, dene the (operational) channel dispersion as
follows [3]:
Denition 2: The dispersion V(W) of a channel W with
capacity C is dened as
V(W) = lim
pe0
limsup
n
n
_
C R(n, p
e
)
Q
1
(p
e
)
_
2
, (17)
where R(n, p
e
) is the highest achievable rate for codeword
error probability p
e
and codeword length n.
In 1962 , Strassen [4] used the Gaussian approximation to
derive the following result for DMCs:
R(n, p
e
) = C
_
V
n
Q
1
(p
e
) +O
_
log n
n
_
, (18)
where C is the channel capacity, and the new quantity V is
the (information-theoretic) dispersion, which is given by
V VAR(i(X; Y )), (19)
where i(x; y) is the information spectrum, given by
i(x; y) log
P
XY
(x, y)
P
X
(x)P
Y
(y)
, (20)
and the distribution of X is the capacity-achieving distri-
bution that minimizes V . Strassens result proves that the
dispersion of DMCs is equal to VAR(i(X; Y )). This result
was recently tightened (and extended to the power-constrained
AWGN channel) in [3]. It is also known that the channel
dispersion and the error exponent are related as follows. For a
channel with capacity C and dispersion V , the error exponent
can be approximated (for rates close to the capacity) by
E(R)
=
(CR)
2
2V ln 2
. See [3] for details on the early origins of
this approximation by Shannon.
We now explore the dispersion for the case of channels with
side information at the receiver.
Theorem 3: The dispersion of the channel W with CSI at
the receiver is given by
V(W) = E[V(W
S
)] +VAR[C(W
S
)] , (21)
000800
where both expectation and variance are taken w.r.t. the
random state S.
Proof: Since W is nothing but a DMC with a vec-
tor output, the proof boils down to the calculation of
VAR[i(X; (Y, S))]. The information spectrum in this case is
given by
i(x; y, s) = log
P
Y SX
(y, s, x)
P
Y S
(y, s)P
X
(x)
(a)
= log
P
Y |S,X
(y|s, x)
P
Y |S
(y|s)
i(x; y|s), (22)
where (a) follows since X and S are independent.
Suppose that s is xed, i.e. consider the channel W
s
. The
capacity is given by
C(W
s
) = E[i(X; Y |S)|S = s]
= I(X; Y |S = s). (23)
The dispersion of the channel W
s
is given by
V(W
s
) = VAR(i(X; Y |S)|S = s)
= E
_
i
2
(X; Y |S)|S = s
E
2
[i(X; Y |S)|S = s]
= E
_
i
2
(X; Y |S)|S = s
C(W
s
)
2
. (24)
Finally, the dispersion of the original channel W is given
as follows:
V(W) = VAR(i(X; Y |S))
(a)
= E[VAR[i(X; Y |S)|S = s]]
+VAR[E[i(X; Y |S)|S = s]]
= E[V(W
S
)] +VAR[C(W
S
)] , (25)
where (a) follows from the law of total variance.
A few notes regarding this result:
Let
V(W) E[V(W
S
)]. As an immediate corollary of
Theorem 3, it can be seen that
V(W) underestimates the
true dispersion of W, V(W). This fact ts the exponent
case: both expected exponent and expected dispersion are
too optimistic w.r.t. the true exponent and dispersion.
The factor VAR[C(W
S
)] can be viewed as a penalty
factor over the expected dispersion
V(W), that grows as
the capacities of the underlying channels are more spread.
IV. CODE DESIGN
When designing channel codes, the fact that the output is
two-dimensional may complicate the code design. It would
therefore be of interest to apply some processing on the outputs
Y and S, and feed them to the decoder as a single value. We
seek such a processing method that would not compromise
the achievable performance over the modied channel (not
only in the capacity sense, but in the error probability at nite
codelengths sense as well).
For binary channels this can be done easily by calculating
the log-likelihood-ratios for each channel output pair (y, s)
(see Figure 2).
For channel outputs s and y, denote the LLR of x given
(y, s) by z:
z = LLR(y, s) log
P
Y |S,X
(y|s, x = 0)
P
Y |S,X
(y|s, x = 1)
. (26)
It is well known that for channels with binary input, the
optimal ML decoder can be implemented to work on the LLR
values only. Therefore by plugging the LLR calculator at the
channel output, and supplying the decoder with the LLRs only,
the performance is not harmed, and we can therefore regard
the channel as a simple DMC with input x and output z =
LLR(y, s) for code design purposes.
V. EXAMPLES
A. Symmetrization of binary channels with equiprobable input
In the design of state-of-the-art channel codes, it is usually
convenient to have channels that are symmetric. In recent years
there have developed methods to design very efcient binary
codes, such as LDPC codes. When designing LDPC codes, A
desired property of a binary channel is that its output will be
symmetric[6].
Denition 3 (Binary input, output symmetric channels [6]):
A memoryless binary channel U with input alphabet {0, 1}
and output alphabet R is called output-symmetric, if for all
y R
U(y|0) = U(y|1). (27)
Consider a general binary channel W with arbitrary out-
put (which is not necessarily symmetric), and suppose that,
for practical reasons, we are interested in coding over this
channel with equiprobable input (which may or may not be
the achieving prior for that channel). The fact that we use
equiprobable input does not make the channel symmetric
according to Denition 3. However, there exists a method
for transforming this channel to a symmetric one, without
compromising on the capacity, error exponent or dispersion:
First, we add the LLR calculation to the channel and regard it
as a part of the channel. This way we get a real-output channel
from any arbitrary channel. Second, before we transmit the
binary codewords on the channel, instead of coding on the
channel directly, we perform a bit-wise XOR operation with
an iid pseudo-random binary vector. It can be shown that by
multiplying the LLR values by 1 wherever the input was
ipped, the LLRs are corrected. It can also be shown that
the channel, with the corrected LLR calculation is symmetric
according to Denition 3. In [7], this method (termed channel
adapters) was used in order to symmetrize the sub-channels of
several coded modulation schemes. It is also shown in [7] that
the capacity is unchanged by the channel adapters. By using
Theorems 1 and 3, it can be veried that the error exponent
bounds and the dispersion remain the same as well.
B. Multilevel Coding and Multistage Decoding (MLC-MSD)
MLC-MSD is a method for using binary codes in order
to achieve capacity on nonbinary channels (see, e.g. [8]). In
000801
W m m
z R
n
Random
state
Encoder Decoder
LLR
calc.
Fig. 2. Incorporating LLR calculation into the channel
MLC-MSD, the binary encoders work in parallel over the same
block of channel uses, and the decoders work sequentially as
follows: the rst decoder assumes the rest of the codewords
are noise and decodes the message from the rst encoder.
Every other decoder, in its turn, decodes the message from the
corresponding encoder assuming that the decoded messages
from the previous decoders are correct, therefore regards these
messages as side-information. The effective channels between
each encoder-decoder, called sub-channels, are in fact channels
with CSI at the receiver, and therefore can be analyzed by
Theorems 1 and 3. For more details on nite-length analysis
of MLC-MSD, see [9].
C. Bit-Interleaved Coded Modulation (BICM)
BICM [10] is another popular method for channel coding
using binary codes over nonbinary channels (for example,
a channel with output of size 2
L
). It is based on taking a
single binary code, feeding it into a long interleaver, and
then mapping the interleaved coded bits onto the nonbinary
channel alphabet (every L-tuple of consecutive bits is mapped
to a symbol in the channel input alphabet of size 2
L
). At the
receiver, the LLR of all coded bits are calculated according to
the mapping, de-interleaved and fed to the decoder.
By assuming that the interleaver is ideal (i.e. of innite
length), the equivalent channel of BICM is modeled as a
binary channel with a random state [10]. The state is chosen
uniformly from {1, ..., L}, and represents the index of the
input bit in the L-tuple. Since the state is known to the receiver
only, this model ts the channel models discussed in the paper.
Finite blocklength analysis of BICM should be done care-
fully: although the model of a binary channel with a state
known at the receiver allows the derivation of error exponent
and channel dispersion, they do not have the usual meaning of
quantifying the performance of BICM at nite block lengths.
The reason for that is the interleaver: how can one rely on the
existence of an innite-length interleaver in order to estimate
the nite-length performance?
The solution comes in the form of an explicit nite-length
interleaver. Recently an alternative scheme called Parallel
BICM was proposed [11], where binary codewords are used
in parallel and an interleaver of nite length is used in order
to validate the BICM model of a binary channel with a state
known at the receiver. This allows the proper use of Theorems
1 and 3 (see [11] for the details).
D. Fading Channels
Rayleigh fading channel, which is popular in wireless
communication, can be modeled as a channel with CSI at the
receiver. The state in this setting is the fade value, which is
usually estimated and some version of it is available at the
receiver. When the fading is fast (a.k.a. ergodic fading) the
channel is memoryless and ts the model discussed in the
paper, and Theorems 1 and 3 can be applied.
ACKNOWLEDGMENT
A. Ingber is supported by the Adams Fellowship Program
of the Israel Academy of Sciences and Humanities.
REFERENCES
[1] Robert G. Gallager, Information Theory and Reliable Communication,
John Wiley & Sons, Inc., New York, NY, USA, 1968.
[2] Pierre Moulin and Ying Wang, Capacity and random-coding exponents
for channel coding with side information, IEEE Trans. on Information
Theory, vol. 53, pp. 13261347, 2007.
[3] Y. Polyanskiy, H.V. Poor, and S. Verd u, Channel coding rate in the
nite blocklength regime, IEEE Trans. on Information Theory, vol. 56,
no. 5, pp. 2307 2359, May 2010.
[4] V. Strassen, Asymptotische absch atzungen in shannons informa-
tionstheorie, Trans. Third Prague Conf. Information Theory, 1962,
Czechoslovak Academy of Sciences, pp. 689723.
[5] Thomas M. Cover and Joy A. Thomas, Elements of Information Theory,
John Wiley & sons, 1991.
[6] Thomas J. Richardson, Mohammad Amin Shokrollahi, and R udiger L.
Urbanke, Design of capacity-approaching irregular low-density parity-
check codes, IEEE Trans. on Information Theory, vol. 47, no. 2, pp.
619637, 2001.
[7] Jilei Hou, Paul H. Siegel, Laurence B. Milstein, and Henry D. Pster,
Capacity-approaching bandwidth-efcient coded modulation schemes
based on low-density parity-check codes, IEEE Trans. on Information
Theory, vol. 49, no. 9, pp. 21412155, 2003.
[8] Udo Wachsmann, Robert F. H. Fischer, and Johannes B. Huber, Mul-
tilevel codes: Theoretical concepts and practical design rules, IEEE
Trans. on Information Theory, vol. 45, no. 5, pp. 13611391, 1999.
[9] Amir Ingber and Meir Feder, Capacity and error exponent analysis
of multilevel coding with multistage decoding, in Proc. IEEE Inter-
national Symposium on Information Theory, Seoul, South Korea, 2009,
pp. 17991803.
[10] Giuseppe Caire, Giorgio Taricco, and Ezio Biglieri, Bit-interleaved
coded modulation, IEEE Trans. on Information Theory, vol. 44, no. 3,
pp. 927946, 1998.
[11] Amir Ingber and Meir Feder, Parallel bit interleaved coded mod-
ulation, in ALLERTON 2010, 48th Annual Allerton Conference on
Communication, Control and Computing, September 29 - October 1,
2010, Allerton, USA, 09 2010.
000802