In Depend
In Depend
1
, . . . ,
n
, i.e., to nd a random variable such that
H([(
1
, . . . ,
n
)) and H([(
1
, . . . ,
n
)) are small. We say
that extraction of common information is impossible if
the entropy of any such variable is small.
Let us show that this is the case if and are indepen-
dent. In this case
n
= (
1
, . . . ,
n
) and
n
= (
1
, . . . ,
n
)
are independent. Recall the well-known inequality
H() H([
n
) +H([
n
) +I(
n
:
n
).
Here I(
n
:
n
) = 0 (because
n
and
n
are independent);
two other summands on the right hand side are small by
our assumption.
It turns out that a similar statement holds for dependent
random variables. However, there is one exception. If the
joint probability matrix of (, ) can be divided into blocks,
there is a random variable that is a function of and a
function of (block number). Then = (
1
, . . . ,
n
) is
common information of
n
and
n
.
It was shown by Ahlswede, Gacs and Korner [1], [2],
[4] that this is the only case when there exists common
information.
Their original proof is quite technical. Several years
ago another approach was proposed by Romashchenko [5]
using conditionally independent random variables. Ro-
mashchenko introduced the notion of conditionally inde-
pendent random variables and showed that extraction of
common information from conditionally independent ran-
dom variables is impossible. We prove that if the joint
probability matrix of a pair of random variables (, ) is
Princeton University
E-mail: {kmakaryc,ymakaryc}@princeton.edu
This work was done while the authors were at Moscow State Uni-
versity.
Supported by Russian Foundation for Basic Research grant 01-01-
01028.
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version
may no longer be accessible.
not a block matrix, then and are conditionally indepen-
dent. We also show several new information inequalities for
conditionally independent random variables.
II. Conditionally independent random variables
Consider four random variables , ,
. Suppose
that
and
, i.e., I(
) = 0, I( : [
) = 0 and I( : [
) = 0. Then we
say that and are conditionally independent of order
1. (Conditionally independent random variables of order 0
are independent random variables.)
We consider conditional independence of random vari-
ables as a property of their joint distributions. If a pair of
random variables and has the same joint distribution
as a pair of conditionally independent random variables
0
and
0
(on another probability space), we say that and
are conditionally independent.
Replacing the requirement of independence of
and
and
,
i.e. I( : [
) = I( : [
) = 0.
Denition 2: (Romashchenko [5]) Two random variables
and are called conditionally independent random vari-
ables of order k (k 0) if there exists a probability space
and a sequence of pairs of random variables
(
0
,
0
), (
1
,
1
), . . . , (
k
,
k
)
on it such that
(a) The pair (
0
,
0
) has the same distribution as (, ).
(b)
i
and
i
are conditionally independent with respect
to
i+1
and
i+1
when 0 i < k.
(c)
k
and
k
are independent random variables.
The sequence
(
0
,
0
), (
1
,
1
), . . . , (
k
,
k
)
is called a derivation for (, ).
We say that random variables and are conditionally
independent if they are conditionally independent of some
order k.
The notion of conditional independence can be applied
for analysis of common information using the following ob-
servations (see below for proofs):
Lemma 1: Consider conditionally independent random
variables and of order k. Let
n
[
n
] be a sequence
2
of independent random variables each with the same dis-
tribution as []. Then the variables
n
and
n
are con-
ditionally independent of order k.
Theorem 1: (Romashchenko [5]) If random variables
and are conditionally independent of order k, and is an
arbitrary random variable (on the same probability space),
then
H() 2
k
H([) + 2
k
H([).
Denition 3: An mn matrix is called a block matrix if
(after some permutation of its rows and columns) it consists
of four blocks; the blocks on the diagonal are not equal to
zero; the blocks outside the diagonal are equal to zero.
Formally, A is a block matrix if the set of its rst indices
1, . . . , m can be divided into two disjoint nonempty sets
I
1
and I
2
(I
1
. I
2
= 1, . . . , m) and the set of its second
indices 1, . . . , n can be divided into two sets J
1
and J
2
(J
1
.J
2
= 1, . . . , n) in such a way that each of the blocks
a
ij
: i I
1
, j J
1
and a
ij
: i I
2
, j J
2
contains
at least one nonzero element, and all the elements outside
these two blocks are equal to 0, i.e. a
ij
= 0 when (i, j)
(I
1
J
2
) (I
2
J
1
).
Theorem 2: Random variables are conditionally inde-
pendent i their joint probability matrix is not a block
matrix.
Using these statements, we conclude that if the joint
probability matrix of a pair of random variables (, ) is
not a block matrix, then no information can be extracted
from a sequence of n independent random variables each
with the same distribution as (, ):
H() 2
k
H([
n
) + 2
k
H([
n
)
for some k (that does not depend on n) and for any random
variable .
III. Proof of Theorem 1
Theorem 1: If random variables and are con-
ditionally independent of order k, and is an arbitrary
random variable (on the same probability space), then
H() 2
k
H([) + 2
k
H([).
Proof : The proof is by induction on k. The statement
is already proved for independent random variables and
(k = 0).
Suppose and are conditionally independent with re-
spect to conditionally independent random variables
and
) H([
) +H([
) +I( : [
) =
H([
) +H([
) H([) +H([).
Similarly, H([
) + 2
n1
H([
). Replac-
ing H([
) and H([
is a
function of and
is a function of , then
and
are
conditionally independent. (Indeed, if and are condi-
tionally independent with respect to some
and
, then
and
and
.)
(c) If two random variables are k-conditionally inde-
pendent, then they are l-conditionally independent for
any l > k. (We can add some constant random variables
to the end of the derivation.)
(d) Assume that conditionally independent random vari-
ables
1
and
1
are dened on a probability space
1
and
conditionally independent random variables
2
and
2
are
dened on a probability space
2
. Consider random vari-
ables (
1
,
2
) and (
1
,
2
) that are dened in a natural
way on the Cartesian product
1
2
. Then (
1
,
2
) and
(
1
,
2
) are conditionally independent. Indeed, for each
pair (
i
,
i
) consider its derivation
(
0
i
,
0
i
), (
1
i
,
1
i
), . . . , (
l
i
,
l
i
)
(using (c), we may assume that both derivations have the
same length l).
3
Then the sequence
((
0
1
,
0
2
), (
0
1
,
0
2
)), . . . , ((
l
1
,
l
2
), (
l
1
,
l
2
))
is a derivation for the pair of random variables
((
1
,
2
), (
1
,
2
)). For example, random variables
(
1
,
2
) = (
0
1
,
0
2
) and (
1
,
2
) = (
0
1
,
0
2
) are independent
given the value of (
1
1
,
1
2
), because
1
and
1
are indepen-
dent given
1
1
, variables
2
and
2
are independent given
1
2
, and the measure on
1
2
is equal to the product of
the measures on
1
and
2
.
Applying (d) several times, we get Lemma 1.
Combining Lemma 1 and (b), we get the following state-
ment:
(e) Let (
1
,
1
), . . . , (
n
,
n
) be independent and identi-
cally distributed random variables. Assume that the vari-
ables in each pair (
i
,
i
) are conditionally independent.
Then any random variables
and
, where
depends
only on
1
, . . . ,
n
and
depends only on
1
, . . . ,
n
, are
conditionally independent.
Denition 4: Let us introduce the following notation:
D
=
_
1/2
1/2
_
(where 0 1/2).
The matrix D
1/4
corresponds to a pair of independent
random bits; as tends to 0 these bits become more depen-
dent (though each is still uniformly distributed over 0, 1).
Lemma 2: (i) D
1/4
is a good matrix.
(ii) If D
is
good.
Proof:
(i) The matrix D
1/4
is of rank 1, hence it is good (inde-
pendent random bits).
(ii) Consider a pair of random variables and dis-
tributed according to D
.
Dene new random variables
and
as follows:
if (, ) = (0, 0) then (
) = (0, 0);
if (, ) = (1, 1) then (
) = (1, 1);
if (, ) = (0, 1) or (, ) = (1, 0) then
(
) =
_
_
(0, 0) with probability /2;
(0, 1) with probability (1 )/2;
(1, 0) with probability (1 )/2;
(1, 1) with probability /2.
The joint probability matrix of
and
given = 0 is
equal to
_
(1 )
2
(1 )
(1 )
2
_
and its rank equals 1. Therefore,
and
are independent
given = 0.
Similarly, the joint probability matrix of
and
given
= 1, = 0 or = 1 has rank 1. This yields that
and
and
and
is
_
1/2 (1 ) (1 )
(1 ) 1/2 (1 )
_
,
hence D
(1)
is a good matrix.
(iii) Consider the sequence
n
dened by
0
= 1/4 and
n+1
=
n
(1
n
). The sequence
n
tends to zero (its limit
is a root of the equation x = x(1 x)). It follows from
statements (i) and (ii) that all matrices D
n
are good.
Note: The order of conditional independence of D
tends
to innity as 0. Indeed, applying Theorem 1 to ran-
dom variables and with joint distribution D
and to
= , we obtain
H() 2
k
(H([) +H([)) = 2
k
H([).
Here H() = 1; for any xed value of the random variable
takes two values with probabilities 2 and 12, therefore
H([) = (12) log
2
(12)2 log
2
(2) = O( log
2
)
and (if D
] as
a transition from [] with transition matrix A [B]. The
joint probability matrix of (
) is equal to A
T
MB. But
since the transitions are independent from and , the
new random variables are conditionally independent.
More formally, let us randomly (independently from
and ) choose vectors c and
d as follows
Pr(proj
i
(c) = j) = a
ij
,
4
Pr(proj
i
(
d) = j) = b
ij
,
where proj
i
is the projection onto the i-th component.
Dene
= proj
(c) and
= proj
d). Then
(i) the joint probability matrix of (
) is equal to
A
T
MB;
(ii) the pair (, c) is conditionally independent from the
pair (,
d). Hence by statement (b),
and
are condi-
tionally independent.
Now let us prove the following technical lemma.
Lemma 5: For any nonsingular n n matrix M and a
matrix R = (r)
ij
with the sum of its elements equal to 0,
there exist matrices P and Q such that
1. R = P
T
M +MQ;
2. the sum of all elements in each row of P is equal to 0;
3. the sum of all elements in each row of Q is equal to 0.
Proof: First, we assume that M = I (here I is the
identity matrix of the proper size), and nd matrices P
and Q
such that
R = P
T
+Q
.
Let us dene P
= (p
)
ij
and Q
= (q
)
ij
as follows:
q
ij
=
1
n
n
k=1
r
kj
.
Note that all rows of Q
= (R Q
)
T
It is easy to see that condition (1) holds. Condition (3)
holds because the sum of all elements in any row of Q is
equal to the sum of all elements of R divided by n, which
is 0 by the condition. Condition (2) holds because
n
j=1
p
ij
=
n
j=1
_
r
ji
1
n
n
k=1
r
ki
_
= 0.
Now we consider the general case. Put P = (M
1
)
T
P
and Q = M
1
Q
u) = 0
and Qu = M
1
(Q
i=1
n
j=1
x
ij
= 1.
(This space contains the set of all joint probability matri-
ces.)
Let U be the ane space of all n n matrices in which
the sum of all elements in each row is equal to 1:
U = X :
n
j=1
x
ij
= 1 for all i.
(This space contains the set of stochastic matrices.)
Let
U be a neighborhood of I in U such that all matrices
from this neighborhood are invertible. Dene a mapping
:
U
U V as follows:
(A, B) = (A
T
)
1
MB
1
.
Let us show that the dierential of this mapping at the
point A = B = I is a surjective mapping from T
(I,I)
U
U
(the tangent space of
U
U at the point (I, I)) to T
M
V
(the tangent space of V at the point M). Dierentiate
at (I, I):
d[
A=I, B=I
= d
_
(A
T
)
1
MB
1
_
= (dA)
T
M MdB.
We need to show that for any matrix R T
M
V , there
exist matrices (P, Q) T
(I,I)
U
U such that
R = P
T
M MQ.
But this is guaranteed by Corollary 5.1.
Since the mapping has a surjective dierential at (I, I),
it has a surjective dierential in some neighborhood N
1
of
(I, I) in
U
U. Take a pair of stochastic matrices (A
0
, B
0
)
from this neighborhood such that these matrices are inte-
rior points of the set of stochastic matrices.
Now take a small neighborhood N
2
of (A
0
, B
0
) from the
intersection of N
1
and the set of stochastic matrices. Since
the dierential of at (A
0
, B
0
) is surjective, the image of
N
2
has an interior point. Hence it contains a good matrix
(recall that the set of good matrices is dense in the set of
all joint probability matrices). In other words, (A
1
, B
1
) =
(A
T
1
)
1
MB
1
1
is a good matrix for some pair of stochastic
matrices (A
1
, B
1
) N
2
. This nishes the proof.
Lemma 7: Any joint probability matrix without zero el-
ements is a good matrix.
Proof: Suppose that X = (v
1
, . . . v
n
) is an m n
(m > n) matrix of rank n. It is equal to the product of a
5
nonsingular matrix and stochastic matrix:
X = (v
1
u
1
. . . u
mn
, v
2
, . . . , v
n
, u
1
, . . . , u
mn
)
_
_
_
I
1 0 ... 0
.
.
.
.
.
.
.
.
.
.
.
.
1 0 ... 0
_
_
_
where u
1
, . . . , u
mn
are suciently small vectors with pos-
itive components that form a basis in R
m
together with
v
1
, . . . , v
n
(it is easy to see that such vectors do exist); vec-
tors u
1
, . . . , u
mn
should be small enough to ensure that
the vector v
1
u
1
. . . u
mn
has positive elements.
The rst factor is a nonsingular matrix with positive ele-
ments and hence is good. The second factor is a stochastic
matrix, so the product is a good matrix.
Therefore, any matrix of full rank without zero elements
is good. If a mn matrix with positive elements does not
have full rank, we can add (in a similar way) m linearly
independent columns to get a matrix of full rank and then
represent the given matrix as a product of a matrix of full
rank and stochastic matrix.
We denote by S(M) the sum of all elements of a matrix
M.
Lemma 8: Consider a matrix N whose elements are ma-
trices N
ij
of the same size. If
(a) all N
ij
contain only nonnegative elements;
(b) the sum of matrices in each row and in each column
of the matrix N is a matrix of rank 1;
(c) the matrix P with elements p
ij
= S(N
ij
) is a good
joint probability matrix;
then the sum of all the matrices N
ij
is a good matrix.
Proof: This lemma is a reformulation of the denition
of conditionally independent random variables. Consider
random variables
) = (i, j) is equal to p
ij
, and the probability
of the event
= k, = l,
= i,
= j
is equal to the (k, l)-th element of the matrix N
ij
.
The sum of matrices N
ij
in a row i corresponds to the
distribution of the pair (, ) given
= i; the sum of
matrices N
ij
in a column j corresponds to the distribution
of the pair (, ) given
ij
of rank 1 with support N:
E
ij
=
_
e
i
+
kA
e
k
_
_
e
j
+
lB
e
l
_
T
,
where e
1
, . . . , e
n
is the standard basis in R
n
.
The coordinates c
ij
of M in the new basis E
ij
continu-
ously depend on . Thus they remain positive if is suf-
ciently small. So taking a suciently small we get the
required representation of M as the sum of matrices of
rank 1 with support N:
M =
(i,j)N
c
ij
E
ij
.
Denition 6: An r-decomposition of a matrix is its ex-
pression as a (nite) sum of r-matrices M = M
1
+M
2
+. . .
of the same size such that the supports of M
i
and M
i+1
intersect (for any i). The length of the decomposition is
the number of the summands; the r-complexity of a matrix
is the length of its shortest decomposition (or +, if there
is no such decomposition).
Lemma 10: Any non-block matrix M with nonnegative
elements has an r-decomposition.
Proof: Consider a graph whose vertices are nonzero
entries of M. Two vertices are connected by an edge
i they are in the same row or column. By assump-
tion, the matrix is a non-block matrix, hence the graph
is connected and there exists a (possibly non-simple) path
(i
1
, j
1
) . . . (i
m
, j
m
) that visits each vertex of the graph at
least once.
Express M as the sum of matrices corresponding to the
edges of the path: each edge corresponds to a matrix whose
support consists of the endpoints of the edge; each positive
6
element of M is distributed among matrices corresponding
to the adjacent edges. Each of these matrices is of rank 1.
So the expression of M as the sum of these matrices is an
r-decomposition.
Corollary 10.1: The r-complexity of any non-block ma-
trix is nite.
Lemma 11: Any non-block matrix M is good.
Proof: The proof uses induction on r-complexity of
M. For matrices of r-complexity 1, we apply Lemma 7.
Now suppose that M has r-complexity 2. In this case M
is equal to the sum of some r-matrices A and B such that
their supports are intersecting rectangles. By Lemma 9,
each of the matrices A and B is the sum of matrices of
rank 1 with the same support.
Suppose, for example, that A = A
1
+ A
2
+ A
3
and B =
B
1
+B
2
. Consider the block matrix
_
_
_
_
_
_
A
1
0 0 0 0
0 A
2
0 0 0
0 0 A
3
0 0
0 0 0 B
1
0
0 0 0 0 B
2
_
_
_
_
_
_
.
The sum of the matrices in each row and in each column is
a matrix of rank 1. The sum of all the entries is equal to
A + B. All the conditions of Lemma 8 but one hold. The
only problem is that the matrix p
ij
is diagonal and hence
is not good, where p
ij
is the sum of the elements of the
matrix in the (i, j)-th entry (see Lemma 8). To overcome
this obstacle take a matrix e with only one nonzero element
that is located in the intersection of the supports of A and
B. If this nonzero element is suciently small, then all the
elements of the matrix
N =
_
_
_
_
_
_
A
1
4e e e e e
e A
2
4e e e e
e e A
3
4e e e
e e e B
1
4e e
e e e e B
2
4e
_
_
_
_
_
_
are nonnegative matrices. The sum of the elements of each
of the matrices that form the matrix N is positive. And
the sum of the elements in any row and in any column is
not changed, so it is of rank 1. Using Lemma 8 we conclude
that the matrix M is good.
The proof for matrices of r-complexity 3 is similar. For
simplicity, consider the case where a matrix of complexity 3
has an r-decomposition M = A+B+C, where A, B, C are
r-matrices of rank 1. Let e
1
be a matrix with one positive
element that belongs to the intersection of the supports of
A and B (all other matrix elements are zeros), and e
2
be
a matrix with a positive element in the intersection of the
supports of B and C.
Now consider the block matrix
N =
_
_
Ae
1
e
1
0
e
1
B e
1
e
2
e
2
0 e
2
C e
2
_
_
.
Clearly, the sums of the matrices in each row and in each
column are of rank 1. The support of the matrix (p)
ij
is of
the form
_
_
0
0
_
_
;
and (p)
ij
has r-complexity 2.
2
By the inductive assumption
any matrix of r-complexity 2 is good. Therefore, M is a
good matrix (Lemma 8).
In the general case (any matrix of r-complexity 3) the
reasoning is similar. Each of the matrices A, B, C is repre-
sented as the sum of some matrices of rank 1 (by Lemma 9).
Then we need several entries e
1
(e
2
) (as it was for matrices
of r-complexity 2). In the same way, we prove the lemma
for matrices of r-complexity 4 etc.
This concludes the proof of Theorem 2: Random vari-
ables are conditionally independent if and only if their joint
probability matrix is a non-block matrix.
Note that this proof is constructive in the following
sense. Assume that the joint probability matrix for , is
given and this matrix is not a block matrix. (For simplic-
ity we assume that matrix elements are rational numbers,
though this is not an important restriction.) Then we can
eectively nd k such that and are k-independent,
and nd the joint distribution of all random variables that
appear in the denition of k-conditional independence.
(Probabilities for that distribution are not necessarily ratio-
nal numbers, but we can provide algorithms that compute
approximations with arbitrary precision.)
V. Improved version of Theorem 1
The inequality
H() 2
k
H([) + 2
k
H([)
from Theorem 1 can be improved. In this section we prove
a stronger theorem.
Theorem 3: If random variables and are condition-
ally independent of order k, and is an arbitrary random
variable, then
H() 2
k
H([) + 2
k
H([) (2
k+1
1)H([),
or, in another form,
I( : ) 2
k
I( : [) + 2
k
I( : [).
Proof: The proof is by induction on k.
We use the following inequality:
H() = H([) +H([)+
I( : ) I( : [) H([)
H([) +H([) +I( : ) H([).
If and are independent then I( : ) = 0, we get the
required inequality.
2
Its support is the union of two intersecting rectangles, so the ma-
trix is the sum of two r-matrices.
7
Assume that and are conditionally independent with
respect to
and
and
)
dened by the following formula
Pr(
= c,
= d[ = a, = b, = g) =
Pr(
= c,
= d[ = a, = b).
The distribution of (, ,
), and (
) is independent from
given (, ).
From the relativized form of the inequality
H() H([) +H([) +I( : ) H([)
(
)
H([
) +H([
) +I( : [
) H([
)
H([) +H([) H([
).
Note that according to our assumption
) = H([).
Using the upper bound for H([
).
Applying the inequality
H([
) H([
) = H([),
we get the statement of the theorem.
VI. Rate Regions
Denition 7: The rate region of a pair of random vari-
ables , is the set of triples of real numbers (u, v, w) such
that for all > 0, > 0 and suciently large n there exist
coding functions t, f and g; their arguments are pairs
(
n
,
n
); their values are binary strings of length (u+)n|,
(v +)n| and (w +)n| (respectively).
decoding functions r and s such that
r(t(
n
,
n
), f(
n
,
n
)) =
n
and
s(t(
n
,
n
), g(
n
,
n
)) =
n
with probability more then 1 .
This denition (standard for multisource coding theory,
see [3]) corresponds to the scheme of information transmis-
sion presented on Figure 1.
The following theorem was discovered by Vereshchagin.
It gives a new constraint on the rate region when and
are conditionally independent.
n
f(
n
,
n
) t(
n
,
n
) g(
n
,
n
)
n
n
r s
f t
g
Fig. 1. Values of
n
and
n
are encoded by functions f, t and g
and then transmitted via channels of limited capacity (dashed lines);
decoder functions r and s have to reconstruct values
n
and
n
with
high probability having access only to a part of transmitted informa-
tion.
Theorem 4: Let and be k-conditionally independent
random variables. Then,
H() +H() v +w + (2 2
k
)u
for any triple (u, v, w) in the rate region.
(It is easy to see that H() u + v since
n
can be
reconstructed with high probability from strings of length
approximately nu and nv. For similar reasons we have
H() u +w. Therefore,
H() +H() v +w + 2u
for any and . Theorem 4 gives a stronger bound for the
case when and are k-independent.)
Proof: Consider random variables
= t(
n
,
n
), = f(
n
,
n
), = g(
n
,
n
)
from the denition of the rate region (for some xed > 0).
By Theorem 1, we have
H() 2
k
(H([
n
) +H([
n
)).
We can rewrite this inequality as
2
k
H() H((,
n
)) +H((,
n
)) H(
n
) H(
n
)
or
H() +H() + (2 2
k
)H() H() +H()+
2H() H((,
n
)) H((,
n
)) +H(
n
) +H(
n
).
We will prove the following inequality
H() +H() H((,
n
)) cn
8
for some constant c that does not depend on and for suf-
ciently large n. Using this inequality and the symmetric
inequality
H() +H() H((,
n
)) cn
we conclude that
H() +H() + (2 2
k
)H()
H(
n
) +H(
n
) 2cn.
Recall that values of are (v + )n-bit strings; therefore
H() (v + )n. Using similar arguments for and
and recalling that H(
n
) = nH() and H(
n
) = nH()
(independence) we conclude that
(v +)n + (w +)n + (2 2
k
)(u +)n
nH() +nH() 2cn.
Dividing over n and recalling that and may be chosen
arbitrarily small (according to the denition of the rate
region), we get the statement of Theorem 4.
It remains to prove that
H() +H() H((,
n
)) cn
for some c that does not depend on and for suciently
large n. For that we need the following simple bound:
Lemma 12: Let and
) H() + 1 + log m
where m is the number of possible values of
.
Proof: Consider a new random variable with m+1
values that is equal to
if ,=
, and
one additional bit to distinguish between the cases =
and ,=
,
therefore
H(
can be
reconstructed from with probability at least (1 ) (just
replace with a function of ).
Now recall that the pair (,
n
) can be reconstructed
from and (using the decoding function r) with prob-
ability (1 ). Therefore, H((,
n
)) does not exceed
H((, )) + 1 + cn (for some c and large enough n) be-
cause both and
n
have range of cardinality O(1)
n
. It
remains to note that H((, )) H() +H().
Acknowledgements
We thank participants of the Kolmogorov seminar, and
especially Alexander Shen and Nikolai Vereshchagin for the
formulation of the problem, helpful discussions and com-
ments.
We wish to thank Emily Cavalcanti, Daniel J. Webre and
the referees for useful comments and suggestions.
References
[1] R. Ahlswede, J. Korner, On the connection between the entropies
of input and output distributions of discrete memoryless channels,
Proceedings of the 5th Brasov Conference on Probability Theory,
Brasov, 1974; Editura Academiei, Bucuresti, pp. 1323, 1977.
[2] R. Ahlswede, J. Korner. On common information and related
characteristics of correlated information sources. [Online]. Avail-
able: www.mathematik.uni-bielefeld.de/ahlswede/homepage.
[3] I. Csiszar, J. Korner, Information Theory: Coding Theorems for
Discrete Memoryless Systems, Second Edition, Akademiai Kiado,
1997
[4] P. Gacs, J. Korner, Common information is far less than mu-
tual information, Problems of Control and Information Theory,
vol. 2(2), pp. 149162, 1973.
[5] A. E. Romashchenko, Pairs of Words with Nonmaterializable Mu-
tual Information, Problems of Information Transmission, vol. 36,
no. 1, pp. 320, 2000.
[6] C. E. Shannon, A mathematical theory of communication. Bell
System Tech. J., vol. 27, pp. 379423, pp. 623656.
[7] H. S. Witsenhausen, On sequences of pairs of dependent random
variables, SIAM J. Appl. Math, vol. 28, pp. 100113, 1975
[8] A. D. Wyner, The Common Information of two Dependent Ran-
dom Variables, IEEE Trans. on Information Theory, IT-21,
pp. 163179, 1975.