0% found this document useful (0 votes)
3 views

Characterising Probability Distributions via Entropies

This paper addresses the challenge of characterizing the capacity region for networks with dependent sources using entropy functions. It demonstrates how auxiliary random variables can be employed to better capture correlations among sources, leading to tighter outer bounds for network coding problems. The authors provide theoretical results and examples to illustrate the effectiveness of their approach in determining probability distributions from entropies.

Uploaded by

daoodsaleem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Characterising Probability Distributions via Entropies

This paper addresses the challenge of characterizing the capacity region for networks with dependent sources using entropy functions. It demonstrates how auxiliary random variables can be employed to better capture correlations among sources, leading to tighter outer bounds for network coding problems. The authors provide theoretical results and examples to illustrate the effectiveness of their approach in determining probability distributions from entropies.

Uploaded by

daoodsaleem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

ISITA2016, Monterey, California, USA, October 30-November 2, 2016

Characterising Probability Distributions via


Entropies
Satyajit Thakor† , Terence Chan‡ and Alex Grant∗
Indian Institute of Technology Mandi†
University of South Australia‡
Myriota Pty Ltd∗

Abstract—Characterising the capacity region for a network can multicast sessions, and {Ys : s ∈ S} be the set of source
be extremely difficult, especially when the sources are dependent. random variables. These sources are available at the nodes
Most existing computable outer bounds are relaxations of the identified by the mapping (a source may be available at
Linear Programming bound. One main challenge to extend linear
program bounds to the case of correlated sources is the difficulty multiple nodes) a : S → 2V . Similarly, each source may be
(or impossibility) of characterising arbitrary dependencies via demanded by multiple sink nodes, identified by the mapping
entropy functions. This paper tackles the problem by addressing b : S → 2V . For all s assume that a(s) ∩ b(s) = ∅. Each edge
how to use entropy functions to characterise correlation among e ∈ E carries a random variable Ue which is a function of
sources. incident edge random variables and source random variables.
I. I NTRODUCTION Sources are i.i.d. sequences {(Ysn , s ∈ S), n = 1, 2, . . . , }.
Hence, each (Ysn , s ∈ S) has the same joint distribution, and
This paper begins with a very simple and well known result. is independent across different n. For notation simplicity, we
Consider a binary random variable X such that will use (Ys , s ∈ S) to denote a generic copy of the sources
pX (0) = p and pX (1) = 1 − p. at any particular time instance. However, within the same
“time” instance n, the random variables (Ysn , s ∈ S) may
While the entropy of X does not determine exactly what the be correlated. We assume that the distribution of (Ys , s ∈ S)
probabilities of X are, it essentially determines the probability is known.
distribution (up to relabelling). To be precise, let 0 ≤ q ≤ 1/2 Roughly speaking, a link capacity tuple C = (Ce : e ∈ E)
such that H(X) = hb (q) where is achievable if one can design a network coding solution to
hb (q)  −q log q − (1 − q) log(1 − q). transmit the sources {(Ysn , s ∈ S), n = 1, 2, . . . , } to their
respective destinations such that 1) the probability of decoding
Then either p = q or p = 1 − q. Furthermore, the two possible error is vanishing (as n goes to infinity), and 2) the number
distributions can be obtained from each other by renaming of bits transmitted on the link e ∈ E is at most nCe . The set
the random variable outcomes appropriately. In other words, of all achievable link capacity tuples is denoted by R.
there is a one-to-one correspondence between entropies and Theorem 1 (Outer bound [2]): For a given network, consider
distribution (when the random variable is binary). the set of correlated sources (Ys , s ∈ S) with underlying prob-
The basic question now is: How “accurate” can entropies ability distribution PYS (·). Construct any auxiliary random
specify the distribution of random variables? When X is not variables (Ki , i ∈ L) by choosing a conditional probability
binary, the entropy H(X) alone is not sufficient to characterise distribution function PKL |YS (·). Let R be the set of all link
the probability distribution of X. In [1], it was proved that capacity tuples C = (Ce : e ∈ E) such that there exists a
if X is a random scalar variable, its distribution can still polymatroid h satisfying the following constraints
be determined by using auxiliary random variables subject
h(XW , JZ ) − H(YW , KZ ) = 0 (1)
to alphabet cardinality constraint. The results can also be
extended to random vector if the distribution is positive. h(Ue |Xs : a(s) → e, Uf : f → e) = 0 (2)
However, the proposed approach cannot be generalised to the h(Ys : u ∈ b(s)|Xs : u ∈ a(s ), Ue : e → u) = 0 (3)
case when the distribution is not positive. Main contributions: Ce − h(Ue ) ≥ 0 (4)
In this paper, we take a different approach and generalise the
result to any random vectors. Before we continue answering for all W ⊆ S, Z ⊆ L, e ∈ E, u ∈ b(s) and s ∈ S. Then
the question, we will briefly describe an application (based on
R ⊆ R (5)
network coding problems) of characterising distributions (and
correlations) among random variables by using entropies. where the notation x → y means x is incident to y and x, y
Let the directed acyclic graph G = (V, E) serve as a can be an edge or a node.
simplified model of a communication network with error-free Remark 1: The region R will depend on how we choose
point-to-point communication links. Edges e ∈ E have finite the auxiliary random variables (Ki , i ∈ L). In the following,
capacity Ce > 0. Let S be an index set for a number of we give an example to illustrate this fact.

Authorized licensed use limited to: Centrale Supelec. Downloaded on November 07,2024 at 09:50:44 UTC from IEEE Xplore. Restrictions apply.
Copyright (C) 2016 by IEICE 453
ISITA2016, Monterey, California, USA, October 30-November 2, 2016

Consider the following network coding problem depicted Let us further assume that pX (i) ≤ 1/2 for all i. Then by
in Figure 1, in which three correlated sources Y1 , Y2 , Y3 are (8) and strict monotonicity of hb (q) in the interval [0, 1/2], it
available at node 1 and are demanded at nodes 3, 4, 5 respec- seems at the first glance that the distribution of X is uniquely
tively. Here, Y1 , Y2 , Y3 are defined such that Y1 = (b0 , b1 ), specified by the entropies of the auxiliary random variables.
Y2 = (b0 , b2 ) and Y3 = (b1 , b2 ) for some independent However, there is a catch in the argument – The auxiliary
and uniformly distributed binary random variables b0 , b1 , b2 . random variables chosen are not arbitrary. When we “com-
Furthermore, the edges from node 2 to nodes 3, 4, 5 have pute” the probabilities of X from the entropies of the auxiliary
sufficient capacity to carry the random variable U1 available at random variables, it is assumed to know how the random
node 2. We consider two outer bounds obtained from Theorem variables are constructed. Without knowing the “construction”,
Y 1 Y 2 Y3 it is unclear how to find the probabilities of X from entropies.
1 More precisely, suppose we only know that there exists
U1
auxiliary random variables A1 , A2 , A3 such that (7) and (8)
2 U2 U3 U4 hold (without knowing that the random variables are speci-
U1
fied by (6)). Then we cannot determine precisely what the
U1 distribution of X is. Despite this complexity, [1], [2] showed
U1
3 4 5
Y3
a construction of auxiliary random variables from which the
Y1 Y2
probability distribution can be characterised from entropies
Fig. 1. A network example [2]. (see [4] for detailed proofs). The results will also be briefly
restated as a necessary prerequisite for the vector case.
1 for the above network coding problem. In the first scenario, Let X be a random variable with support Nn = {1, . . . , n}
we use no auxiliary random variables, while in the second and Ω be the set of all nonempty binary partitions of Nn . In
scenario, we use three auxiliary random variables such that other words, Ω is the collection of all sets {α, αc } such that
K 0 = b 0 , K1 = b 1 , K2 = b2 . α ⊆ Nn , and both |α| and |αc | are nonzero. We will use α to
denote the set {α, αc }. To simplify notations, we may assume
Let Ri be respectively the outer bounds for the two scenarios. without loss of generality that α is a subset of {2, . . . , n}.
Then R2 is a proper subset of R1 . In particular, the link Clearly, |Ω| = 2n−1 − 1. Unless explicitly stated otherwise,
capacity tuple (Ce = 1, e = 1, ..., 4) is in the region R1 \ R2 we may assume without loss of generality that the probability
[2]. This example shows that by properly choosing auxiliary that X = i (denoted by pi ) is monotonic decreasing. In other
random variables, one can better capture the correlations words,
among the sources, leading to a strictly tighter/better outer p1 ≥ . . . ≥ pn > 0.
bound for network coding. Construction of auxiliary random
variables from source correlation was also considered in [3] Definition 1 (Partition Random Variables): A random vari-
to improve cut-set bounds. able X with support Nn induces 2n−1 − 1 random variables
Aα for α ∈ Ω such that
II. M AIN RESULTS 
α if X ∈ α
In this section, we will show that by using auxiliary random Aα  (9)
αc otherwise.
variables, the probability distribution of a set of random
variables (or a random vector) can be uniquely characterised We called {Aα , α ∈ Ω} the collection of binary partition
from the entropies of these variables. random variables of X.
Remark 2: If |α| = 1 or n − 1, then there exists an element
A. Random Scalar Case
i ∈ X such that Aα = {i} if and only if X = i. Hence, Aα
Consider any ternary random variable X. Clearly, entropies is essentially a binary variable indicating/detecting whether
of X and probability distributions are not in one-to-one X = i or not. As such, we call Aα an indicator variable.
correspondence. In [1], auxiliary random variables are used Furthermore, when n ≥ 3, there are exactly n indicator
in order to exactly characterise the distribution. variables, one for each element in Nn .
Suppose X is ternary, taking values from the set {1, 2, 3}. Theorem 2 (Random Scalar Case): Suppose X is a random
Suppose also that pX (x) > 0 for all x ∈ {1, 2, 3}. Define variable with support Nn . For any α ∈ Ω, let Aα be
random variables A1 , A2 and A3 such that the corresponding binary partition random variables. Now,

1 if X = i suppose X ∗ is another random variable such that 1) the size
Ai = (6) of its support X ∗ is at most the same as that of X, and 2)
0 otherwise.
there exists random variables (Bα , α ∈ Ω) satisfying the
Clearly, following conditions:
H(Ai |X) = 0, (7) H(Bα , α ∈ Δ) = H(Aα , α ∈ Δ) (10)
H(Ai ) = hb (pX (i)). (8) H(Bα |X ∗ ) = 0 (11)

Authorized licensed use limited to: Centrale Supelec. Downloaded on November 07,2024 at 09:50:44 UTC from IEEE Xplore. Restrictions apply.
Copyright (C) 2016 by IEICE 454
ISITA2016, Monterey, California, USA, October 30-November 2, 2016

TABLE I such that for any subset Δ of Ω and τ ⊆ {1, . . . , M },


P ROBABILITY DISTRIBUTIONS OF X AND X∗
H(Bα , α ∈ Δ, Xj∗ , j ∈ τ )
X2
1 2 3 4 = H(Aα , α ∈ Δ, Xj , j ∈ τ ). (13)
a 1/8 1/8 0 0 Then the joint probability distributions of X = (X1 , . . . , XM )
X1 b 1/8 1/8 0 0 and X ∗ = (X1∗ , . . . , XM

) are essentially the same. More pre-
c 0 0 1/8 1/8 cisely, there exists bijective mappings σm for m = 1, . . . , M
d 0 0 1/8 1/8 such that

X2∗ Pr(X = (x1 , . . . , xM ))


1 2 3 4 = Pr(X ∗ = (σ1 (x1 ), . . . , σM (xM ))). (14)
a 1/8 1/8 0 0
X1∗ b 0 1/8 1/8 0 Proof: See Appendix B.
c 0 0 1/8 1/8
C. Application: Network coding outer bound
d 1/8 0 0 1/8
Together with Theorem 1 and the characterisation of distri-
bution using entropies, we obtain the following outer bound
for all Δ ⊆ Ω. Then there is a mapping on R.
Corollary 1: For any given network, consider the set of
σ : Nn → X ∗ correlated sources (Ys , s ∈ S) with underlying probability
distribution PYS (·). From this distribution, construct binary
such that Pr(X = i) = Pr(X ∗ = σ(i)). In other words, the
partition random variables Aα as described in Theorem 3
probability distributions of X and X ∗ are essentially the same
(via renaming outcomes). (for vector case). Let R (Γ∗ ) be the set of all link capacity
tuples C = (Ce : e ∈ E) such that there exists a polymatroid
Proof: A sketch of the proof is shown in Appendix A.
function h satisfying the constraints (2)-(4) and

B. Random Vector Case h(Xs , s ∈ W, Bα , α ∈ Δ)


= H(Ys , s ∈ W, Aα , α ∈ Δ) (15)
Extension of Theorem 2 to the case of random vector
has also been considered briefly in our previous work [1]. for all W ⊆ S, Δ ⊆ Ω, e ∈ E, u ∈ b(s) and s ∈ S. Then
However, the extension is fairly limited in that work – the R ⊆ R .
random vector must have a positive probability distribution
and each individual random variable must take at least three III. C ONCLUSION
possible values. In this paper, we overcome these restrictions In this paper, we showed that by using auxiliary random
and fully generalise Theorem 2 to the random vector case. variables, entropies are sufficient to uniquely characterise the
Example 1: Consider two random vectors X = (X1 , X2 ) probability distribution of a random vector (up to outcome
and X ∗ = (X1∗ , X2∗ ) with probability distributions given in relabelling). Yet, there are still many open questions remained
Table I. If we compare the joint probability distributions of to be answered. For example, the number of auxiliary random
X and X ∗ , they are different from each other. Yet, if we variables used are exponential to the size of the support. Can
treat X and X ∗ as scalars (by properly renaming), then they we reduce the number of auxiliary random variables? What is
indeed have the same distribution (both uniformly distributed the tradeoff between the number of auxiliary variables used
over a support of size 8). This example shows that we cannot and the quality of how well entropies can characterise the
directly apply Theorem 2 to the random vector case, by simply distribution? To the extreme, if only one auxiliary random
mapping a vector into a scalar. variable can be used, how can one pick the variable to best
Theorem 3 (Random Vector): Suppose X = (X1 , . . . , XM ) describe the distribution?
is a random vector with support X of size at least 3. Again,
let Ω be the set of all nonempty binary partitions of X and R EFERENCES
Aα be the binary partition random variable of X such that [1] S. Thakor, T. Chan, and A. Grant, “Characterising correlation via entropy
 functions,” in Information Theory Workshop (ITW), 2013 IEEE, pp. 1–2,
α if X ∈ α Sept 2013 (invited paper).
Aα = (12)
αc otherwise [2] S. Thakor, T. Chan, and A. Grant, “Bounds for network information flow
with correlated sources,” in Australian Communications Theory Workshop
for all α ∈ Ω. (AusCTW), pp. 43 –48, Feb. 2011.
[3] A. Gohari, S. Yang, and S. Jaggi, “Beyond the cut-set bound: Uncertainty
Now, suppose X ∗ = (X1∗ , . . . , XM

) is another random computations in network coding with correlated sources,” IEEE Trans.
vector where there exists random variables Inform. Theory, vol. 59, pp. 5708–5722, Sept 2013.
[4] S. Thakor, T. Chan, and A. J. Grant, “On the capacity of networks with
(Bα , α ∈ Ω) correlated sources,” CoRR, vol. abs/1309.1517, 2013.

Authorized licensed use limited to: Centrale Supelec. Downloaded on November 07,2024 at 09:50:44 UTC from IEEE Xplore. Restrictions apply.
Copyright (C) 2016 by IEICE 455
ISITA2016, Monterey, California, USA, October 30-November 2, 2016

A PPELDIX A - S CALAR C ASE ordinary partition random variables. To prove the theorem, our
The main ingredients in the proofs for Theorems 2 and 3 first immediate goal is to prove that those random variables
are the properties of the partition random variables, which will Bα are indeed binary partition random variables. In partic-
be reviewed as follows. By understanding the properties, we ular, we can prove that
can better understand the logic behind Theorem 2. 1) (Distinctness) All the random variables Bα for α ∈
Lemma 1 (Properties): Let X be a random variable with Ω are distinct and have non-zero entropies.
support Nn , and (Aα , α ∈ Ω) be its induced binary partition 2) (Basis) Let α ∈ Ω. Then there exists
random variables. Then the following properties hold: β1 , . . . , βn−2 ∈ Ω such that
1) (Distinctness) For any α = β , H(Bβk  |Bα , Bβ1  , . . . , Bβk−1  ) > 0 (21)
H(Aα |Aβ ) > 0, (16) for all k = 1, . . . , n − 2.
H(Aβ |Aα ) > 0. (17) 3) (Binary properties) For any α ∈ Ω, Bα is a bi-
nary partition random variable of X ∗ . In this case, we
2) (Completetness) Let A∗ be a binary random variable
may assume without loss of generality that there exists
such that H(A∗ |X) = 0 and H(A∗ ) > 0. Then there
ωα ⊆ X ∗ such that
exists α ∈ Ω such that

ωα if X ∗ ∈ ωα
H(A∗ |Aα ) = H(Aα |A∗ ) = 0. (18) Bα = c (22)
ωα otherwise.

In other words, Aα and A are essentially the same.
4) (Completetness) Let B ∗ be a binary partition random
3) (Basis) Let α ∈ Ω. Then there exists
variable of X ∗ with non-zero entropy. Then there exists
β1 , . . . , βn−2 ∈ Ω α ∈ Ω such that
such that H(B ∗ |Bα ) = H(Bα |B ∗ ) = 0. (23)
H(Aβk  |Aα , Aβ1  , . . . , Aβk−1  ) > 0 (19) Then by (10) – (11) and Proposition 1, we show that Bα
satisfies all properties which are only satisfied by the indicator
for all k = 1, . . . , n − 2.
random variables. Thus, we prove that Bα is an indicator
Among all binary partition random variables, we are par- variable if |α| = 1. Finally, once we have determined which
ticularly interested in those indicator random variables. The are the indicator variables, we can immediately determine the
following proposition can be interpreted as “entropic charac- probability distribution. As H(Aα ) = H(Bα ) for all α ∈
terisation” for those indicator random variables. Ω, the distribution of X ∗ is indeed the same as that of X
Proposition 1 (Characterising indicators): Let X be a (subject to relabelling).
random variable of support Nn where n ≥ 3. Consider the
A PPENDIX B - V ECTOR CASE
binary partition random variables induced by X. Then for all
In this appendix, we will sketch the proof for Theorem 3,
i ≥ 2,
which extends Theorem 2 to the random vector case.
1) H(Ai |Aj , j > i) > 0, and Consider a random vector X = (Xm : m ∈ NM ). We will
2) For all α ∈ Ω such that H(Aα |Aj , j > i) > 0, only consider the general case1 where the support size of X
H(Ai ) ≤ H(Aα ). (20) is at least 3, i.e., S(Xm : m ∈ NM ) ≥ 3.
Let X be the support of X. Hence, elements of X is of the
3) Equalities (20) hold if and only if Aα is an indicator form x = (x1 , . . . , xM ) such that
random variable detecting an element  ∈ Nn such that
Pr(Xm = xm , m ∈ NM ) > 0
p  = pi .
if and only if x ∈ X .
4) Let β ⊆ {2. . . . , n}. The indicator random variable A1 The collection of binary partition random variables induced
is the only binary partition variable of X such that by the random vector X = (Xm , m ∈ NM ) is again indexed
by (Aα , α ∈ Ω). As before, we may assume without loss
H(Aα |Aj , j ∈ β) > 0
of generality that
for all proper subset β of {2. . . . , n}. 
α if X ∈ α
Aα = (24)
Sketch of Proof for Theorem 2: Let X be a random αc otherwise.
scalar and Aα for α ∈ Ω are its induced partition random
variables. Suppose X ∗ is another random variable such that Now, suppose (Bα , α ∈ Ω) is a set of random variables
1) the size of its support X ∗ is at most the same as that of X, satisfying the properties as specified in Theorem 3. Invoking
and 2) there exists random variables (Bα , α ∈ Ω) satisfying Theorem 2 (by treating the random vector X ∗ as one discrete
(10) and (11). variable), we can prove the following.
Roughly speaking, (10) and (11) mean that the set of 1 In the special case when the support size of X is less than 3, the theorem
random variables (Bα , α ∈ Ω) satisfy most properties as can be proved directly.

Authorized licensed use limited to: Centrale Supelec. Downloaded on November 07,2024 at 09:50:44 UTC from IEEE Xplore. Restrictions apply.
Copyright (C) 2016 by IEICE 456
ISITA2016, Monterey, California, USA, October 30-November 2, 2016

1) The size of the support of X ∗ and X are the same. 2) For any γ ⊆ αc , H(A∗σ(α) |A∗σ(x) , x ∈ γ) = 0 if and
2) Bα is a binary partition variable for all α ∈ Ω. only if γ = αc .
3) The set of variables (Bα , α ∈ Ω) contains all distinct The above two properties can then be rephrased as
binary partition random variables induced by X ∗ . 1) For any δ(γ) ⊆ δ(α),
4) Bx is an indicator variable for all x ∈ X .
According to definition, Ax is defined as an indicator H(A∗σ(α) |A∗σ(x) , σ(x) ∈ δ(γ)) = 0
variable for detecting x. However, while Bx is an indicator if and only if δ(γ) = δ(α)
variable, the subscript x in Bx is only an index. The element 2) For any δ(γ) ⊆ δ(αc ),
detected by Bx can be any element in the support of X ∗ ,
which can be completely different from X . To highlight the H(A∗σ(α) |A∗σ(x) , σ(x) ∈ δ(γ)) = 0
difference, we define the mapping σ such that for any x ∈ X , if and only if δ(γ) = δ(αc ).
σ(x) is the element in the support of X ∗ that is detected Now, we can invoke Proposition 2 and prove that A∗δ(α) =
by Bx . In other words A∗σ(x) = Bx . The lemma below A∗σ(α) or equivalently, δ(α) = σ(α) . The proposition
follows from Theorem 2. then follows.
Lemma 2: For all x ∈ X , Proposition 5: Consider two distinct elements x =
Pr(X = x) = Pr(X ∗ = σ(x)). (x1 , . . . , xM ) and x = (x1 , . . . , xM ) in X . Let

Let X ∗ be the support of X ∗ . We similarly define Ω∗ as the σ(x) = y = (y1 , . . . , yM ) (26)


collection of all sets of the form {γ, γ c } where γ is a subset σ(x ) = y  = (y1 , . . . , yM

). (27)
of X ∗ and the sizes of γ and γ c are non-zero. Again, we will
use γ to denote the set {γ, γ c } and define Then xm = xm if and only if ym = ym 
.
 Proof: First, we will prove the only-if statement. Suppose
∗ γ if X ∗ ∈ γ xm = xm . Consider the following two sets
Aγ = c (25)
γ otherwise.
Δ = {x = (x1 , . . . , xM ) ∈ X : xm = xm }, (28)
For any α ∈ Ω, Bα is a binary partition random variable
Δc = {x = (x1 , . . . , xM ) ∈ X : xm = xm }. (29)
of X ∗ . Hence, we may assume without loss of generality that
there exists γ such that A∗γ = Bα . For notation simplicity, It is obvious that H(AΔ |Xm ) = 0. By (10)-(11), we

we may further extend2 the mapping σ such that A∗σ(α) = have H(BΔ |Xm ) = 0. Hence, BΔ = A∗σ(Δ) . Since
Bα for all α ⊆ X . H(BΔ |Xm ) = 0, this implies H(A∗σ(Δ) |Xm
∗ ∗
) = 0.
c 
Proposition 2: Let α ∈ Ω. Suppose Aβ satisfies the Now, notice that x ∈ Δ and x ∈ Δ. By Proposition
following properties: 4, σ(Δ) = {σ(x) : x ∈ Δ}. Therefore, y  = σ(x ) ∈
1) For any γ ⊆ α, H(Aβ |Ax , x ∈ γ) = 0 if and only if σ(Δ) and y = σ(x ) ∈ σ(Δ). Together with the fact that
γ = α. H(A∗σ(Δ) |Xm ∗
) = 0, we can then prove that ym 
= ym .
2) For any γ ⊆ αc , H(Aβ |Ax , x ∈ γ) = 0 if and only Next, we prove the if-statement. Suppose y, y ∈ X ∗ such


if γ = αc . that ym = ym . There exist x and x such that (26) and (27)
Then Aβ = Aα . hold. Again, define
Proof: Direct verification. Λ  {y  = (y1 , . . . , yM

) ∈ X ∗ : ym

= ym }, (30)
By definition of Bα and Proposition 2, we have the c
Λ  {y = 
(y1 , . . . , yM

) ∗
∈X : 
ym = ym }. (31)
following result.
Proposition 3: Let α ∈ Ω. Then Bβ = Bα is the only Then H(A∗Λ |Xm ∗
) = 0. Let Φ  {x ∈ X : σ(x) ∈ Λ}. By
binary partition variable of X ∗ such that definition and Proposition 4, BΦ = A∗σ(Φ) = A∗Λ . Hence,

1) For any γ ⊆ α, H(Bβ |Bx , x ∈ γ) = 0 if and only if we have H(BΦ |Xm ) = 0 and consequently H(AΦ |Xm ) =
γ = α. 0. On the other hand, it can be verified from definition that
2) For any γ ⊆ αc , H(Bβ |Bx , x ∈ γ) = 0 if and only x ∈ Φc and x ∈ Φ. Together with that H(AΦ |Xm ) = 0, we
if γ = αc . prove that xm = xm . The proposition then follows.
Proposition 4: Let α ∈ X . Then σ(α) = δ(α) , where Proof of Theorem 3: A direct consequence of Proposition
δ(α) = {σ(x) : x ∈ α}. 5 is that there exists bijective mappings σ1 , . . . , σM such that
Proof: By Proposition 3, Bα = A∗σ(α) is the only σ(x) = (σ1 (x1 ), . . . , σM (xM )). On the other hand, Theorem
variable such that 2 proved that Pr(X = x) = Pr(X ∗ = σ(x)). Consequently,
1) For any γ ⊆ α, H(A∗σ(α) |A∗σ(x) , x ∈ γ) = 0 if and Pr(X1 = x1 , . . . , XM = xM )
only if γ = α. = Pr(X1∗ = σ1 (x1 ), . . . , XM

= σM (xM )). (32)
2 Strictlyspeaking, σ(α) is not precisely defined. As γ = γ c ,
σ(α) Therefore, the joint distributions of X = (X1 , . . . , XM ) and
can either be γ or γ c . Yet, the precise choice of σ(α) does not have any
effects on the proof. However, we only require that when α is a singleton, X ∗ = (X1∗ , . . . , XM

) are essentially the same (by renaming
σ(α) should also be a singleton. xm as σm (xm )).

Authorized licensed use limited to: Centrale Supelec. Downloaded on November 07,2024 at 09:50:44 UTC from IEEE Xplore. Restrictions apply.
Copyright (C) 2016 by IEICE 457

You might also like