0% found this document useful (0 votes)
3 views

Compact representation of polymatroid axioms for random variables with conditional independencies

This paper presents a compact representation of polymatroid axioms for random variables with conditional independencies, addressing the complexity of using these axioms in communication systems. It identifies a minimal set of axioms and explores their implications, particularly in relation to Bayesian networks and Markov structures. The results aim to simplify the characterization of capacity limits and improve computational efficiency in information theory applications.

Uploaded by

daoodsaleem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Compact representation of polymatroid axioms for random variables with conditional independencies

This paper presents a compact representation of polymatroid axioms for random variables with conditional independencies, addressing the complexity of using these axioms in communication systems. It identifies a minimal set of axioms and explores their implications, particularly in relation to Bayesian networks and Markov structures. The results aim to simplify the characterization of capacity limits and improve computational efficiency in information theory applications.

Uploaded by

daoodsaleem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2012 IEEE Information Theory Workshop

Compact Representation of Polymatroid Axioms for


Random Variables with Conditional Independencies
Satyajit Thakor†, Alex Grant‡ and Terence Chan‡
†Institute of Network Coding, The Chinese University of Hong Kong, Shatin, Hong Kong
‡Institute for Telecommunications Research, University of South Australia
[email protected],{alex.grant, terence.chan}@unisa.edu.au

Abstract—The polymatroid axioms are dominantly used to there may not exists a Markov structure.
study the capacity limits of various communication systems. In most general setting, there may be a set of random
In fact for most of the communication systems, for which the variables with functional dependence and conditional indepen-
capacity is known, these axioms are solely required to obtain
the characterization of capacity. Moreover, the polymatroid dence constraints such that it may not be possible to represent
axioms are stronger tools to tackle the implication problem them as any graphical model (FDG, Bayesian or Markov). In
for conditional independencies compared to the axioms used in this work, we give a compact representation of polymatroid
Bayesian networks. However, their use is prohibitively complex axioms for this most general case. This compact formulation
as the number of random variables increases since the number of can have potentially many applications. For example, it can
inequalities to consider increases exponentially. In this paper we
give a compact characterization of the minimal set of polymatroid be used to find efficiently those conditional independence
axioms when arbitrary conditional independence and functional implications which may not be feasible to find by employing
dependence constraints are given. In particular, we identify those the set of axioms [6] used in Bayesian networks (see [2,
elemental equalities which are implied by given constraints. We Section 14.5] for details). The compact representation also
also identify those elemental inequalities which are redundant enables proving basic information inequalities with arbitrary
given the constraints.
conditional independence constraints faster1 . Moreover, it can
be used for faster computation of the LP bound [2] for
I. I NTRODUCTION
communication scenarios involving random variables with
In [1], we considered complexity reduction of the LP causal dependencies and conditional independencies.
bound via simplified characterization of elemental inequalities The paper is organized as follows. In Section II, we formally
when network coding and source independence constraints describe the problem. In Section III we present the main results
are given for an instance of network coding model. We also of the paper. Algorithms to generate compact presentation of
gave novel algorithms which directly generate the simplified polymatroid axioms are given in Section IV. In Section V, we
characterization. The results developed are also applicable for discuss the reduction in polymatroid inequalities. In Section
computational complexity reduction for proving information VI, we show an application of the main results to obtain
inequalities using Information Theoretic Inequality Prover [2] compact characterization of polymatroid axioms for random
(ITIP) while functional dependence and independence con- variables with Markov structures.
straints are given for a set of random variables (in general,
II. P ROBLEM F ORMULATION
polymatroidal pseudo-variables). The motivation was that the
network coding and source independence constraints for an Let V = {A, B, . . . } be a finite set and P(V) be its power
instance of network coding model can be exploited to construct set (i.e., the set of all subsets). A rank function
Functional Dependence Graphs (FDGs) which in turn be used h : P(V) 7→ R
to find irreducible sets [3] (equalities of joint entropies).
But in a general communication scenario, the random vari- is simply defined as a real-valued function defined on P(V)
ables may have causal dependence relationship rather than such that h(∅) = 0. If h is known from the context, we will
functional dependence relationship. Such a system of random often denote h(A) by H(A).
variables can be modeled as Bayesian networks (directed A rank function h is called entropic if the elements in V
ayclic graphs) [4]. Moreover, if the random variables have are random variables such that h(A) is the joint entropy of
cyclic causal dependency (e.g. feedback) then the Bayesian the set of random variables in A.
network modeling is no longer applicable. Such a system of It is sometimes instrumental to treat h as a column vector
random variables can be modeled either as a Markov chain (or a point) in a 2|V| -dimensional Euclidean space, such that
or a Markov random fields (undirected graph) collectively (1) the axes of the space are labelled by elements of P(V),
regarded as Markov structures [2]. In fact, a Markov structure and (2) the “coordinate” of the point or the vector with respect
is a collection of special conditional independencies called to the A-axis is given by h(A).
full conditional mutual independency. But, the converse is not 1 Here arbitrary means any set of conditional independence constraints
true. That is, for given collection of conditional independencies which may not be consistent with any graphical model.

978-1-4673-0223-4/12/$31.00 ©2012 IEEE 267


Authorized licensed use limited to: Centrale Supelec. Downloaded on November 07,2024 at 09:50:44 UTC from IEEE Xplore. Restrictions apply.
2012 IEEE Information Theory Workshop

Let Γ∗ [V] (or simply Γ∗ ) be the set of all entropy functions. As all entropic functions are polymatroids, (1)-(3) can be
One of the most fundamental and important problem in regarded as information inequalities. In fact, (2) corresponds to
information theory is to characterize Γ∗ . Unfortunately, this is the nonnegativity of conditional entropies and (3) correspond
an extremely difficult problem. Answers to this question are to the nonnegative of conditional mutual information. These
only known when the size of V is less than four. However, an information inequalities are the “basic laws of information”.
outer bound for Γ∗ [V] is available. One of the most common They are of critical importance in proving converse of coding
outer bound of Γ∗ [V] is the set of polymatroidal functions, theorems. In fact, for most communication problems for which
denoted by Γ[V] (or simply Γ). the capacity is known, only the basic inequalities are used to
Definition 1 (Polymatroids): A rank function h is a poly- derive the capacity.
matroid (or equivalently is in Γ[V]) if Another application of Γ[V] is to derive outer-bound for
the set of achievable throughputs in a network. Without going
h(∅) = 0, (1) through the details, it can be proved that the throughput of
h(A) ≥ h(B), ∀B ⊆ A (2) a network can be bounded by solving a linear programming
h(A) + h(B) ≥ h(A ∪ B) + h(A ∩ B), ∀A, B ⊆ V. (3) problem in the following form:

To simplify notation, we will use the following convention: max c> h


For any rank function h, and subsets A, B, C of V, define Subject to h ∈ Γ[V], (6)
Hh (A|B) , h(A ∪ B) − h(B), Φh = 0,
h(V) = 1,
and
where Φh = 0 is the set of equality constraint.
Ih (A; B|C) , h(AC) + h(BC) − h(ABC) − h(C). Remark: In this paper, we are only interested in equalities
Using the above notations, (2) and (3) can be rewritten as in Φh = 0 that are in the following forms Ih (A; B|C) = 0 or
H(A|B) = 0.
Hh (A \ B|B) ≥ 0, ∀B ⊆ A (4) This linear programming problem (6) can be solved effi-
ciently using various kind of optimization methods, including
and
the simplex method. However, as mentioned earlier, one of
Ih (A \ B; B \ A|A ∩ B) ≥ 0, ∀A, B ⊆ V. (5) the hurdle is that the number of constraint involved grow
exponentially with the size of V. Therefore, for practical, it is
Once again, if h is understood implicitly, we will drop the critical to eliminate as many redundant constraint and variables
subscript of h in (4) and (5). as possible.
It is worthy to point that many inequalities listed in (1)-(3) Without the equality constraint Φh = 0 and the “normal-
are in fact redundant. For example, the inequality I(A; BC) ≥ ising constraint” h(V) = 1, the constraint h ∈ Γ[V] can be
0 is implied by the two inequalities regarded as that h satisfies all inequalities in ∆. In this paper,
I(A; B) ≥ 0 we will show that with an equality constraint Φh = 0, some of
the inequalities in ∆ may be rendered redundant and can thus
I(A; C|B) ≥ 0.
be eliminated. Again, the fundamental question to be answered
A natural question thus arises: Which inequalities in (1)-(3) is How to identify which inequalities are redundant and can
are redundant? This question was answered in [5], in which be eliminated, subject to a given set of equality constraint.
it was proved that Γ[V] is characterized by a collection of Before we continue, we first define rigorously what is the
“elemental basic inequalities” meaning of that an inequality is redundant.
Definition 2 (Implication): An inequality Ih (A; B|C) ≥ 0
∆ = ∆1 ∪ ∆2 is said to be implied by the set of equalities
where J= , {Ih (Ai ; Bi |Ci ) = 0 : i = 1, . . . , |J= |}
∆1 , {Ih (A; B|C) ≥ 0 : A, B ∈ V \ C, A 6= B, C ⊆ V} , and inequalities
J≥ , {Ih (Ai ; Bi |Ci ) = 0 : i = |J= | + 1, . . . , |J= | + |J |}
∆2 , {Hh (A|V \ {A}) ≥ 0 : A ∈ V}
if and only if for all h satisfying all the equalities in J= and
Simple counting reveals that there are exactly
  inequalities in J≥ ,
|V| |V|−2
|V| + 2 Ih (A; B|C) ≥ 0.
2
elemental inequalities. Notice that this number grows exponen- Similar can be defined for an equality: an equality
tially with the size of V. This makes it extremely challenging Ih (A; B|C) = 0 is implied by the sets of equalities in J=
how to characterize Γ efficiently. and inequalities in J≥ .

268
Authorized licensed use limited to: Centrale Supelec. Downloaded on November 07,2024 at 09:50:44 UTC from IEEE Xplore. Restrictions apply.
2012 IEEE Information Theory Workshop

Definition 3 (Maximally implied elemental inequalities): As I(A; B|C) = 0,


Let J be a set of information inequalities. Let I be the set
I(A; D|BC) = I(A; D|C) + I(A; B|CD). (13)
of all elemental inequalities that are implied by J . We call
I the maximally implied elemental inequalities by J . Therefore, if I(A; D|C) and I(A; B|CD) are both nonnega-
The following lemma gives an alternative necessary and tive, then so is I(A; BD|C). The lemma thus follows.
sufficient condition under which an (in)equality is implied by Theorem 1 (Generalization): For a set of random variables
the sets of equalities in J= and inequalities in J≥ . V, let I(A; B|C) = 0 be a given identity constraint, then the
Lemma 1 (Necessary and sufficient condition): An equal- following inequalities
ity I(A; B|C) = 0 is implied by the sets of equalities in J=
and inequalities in J≥ if and only if there exists real numbers I(A; E|BC) ≥ 0 (14)
ci , di for i = 1, . . . , |J= | and non-negative numbers cj , dj for I(B; E|AC) ≥ 0 (15)
j = 1, . . . , |J≥ | such that
are redundant elemental inequalities. Where, E ∈ V \{A, B}∪
|J= | |J≥ | C.
X X
I(A; B|C) = ci I(Ai ; Bi |Ci ) − cj I(Aj ; Bj |Cj ) Proof: A direct consequence of Lemma 2.
i=1 j=1 Lemma 2 illustrates that a single equality constraint (7),
and together with two inequality constraint (8) and (9) implies the
|J= | |J≥ |
redundancy of the elemental inequality (10).
X X Based on Lemma 2, we derive a systematic approach to
I(A; B|C) = di I(Ai ; Bi |Ci ) + dj I(Aj ; Bj |Cj ).
eliminate the redundant inequality constraints in (6). The idea
i=1 j=1
is simple. First, we derive as many equalities that are implied
The lemma follows from Farkas’ lemma [7]. by Φh = 0. Then, for each equality achieved in the first step,
III. M AIN RESULTS we can search if there is any inequality that can be eliminated
according to Lemma 2.
As illustrated earlier, the complexity of solving the linear Lemma 3 (Deriving new equalities (1)): The set of ele-
programming problem (6) grows with the number of variables mental inequalities ∆, together with the equality
and constraints involved in the optimization. Naturally, the
complexity can be reduced if redundant (in)equality con- I(A; B|C) = 0 (16)
straints and variables are eliminated. One of the main objective
maximally implies the following set of elemental equalities
of this paper is to identify (and to eliminate) the redundant
(in)equalities in the linear programming problem (6). To ( A ∈ A, )
illustrate the idea, consider the following lemma. I = I(A; B|DC) = 0 : B ∈ B, . (17)
Lemma 2: Consider any random variables A, B, C, D. D ⊆ A ∪ B \ {A, B}
Then the following set of equality and inequality constraint
Proof: Using the chain rule for mutual information, it is
I(A; B|C) = 0 (7) easy to show that ∆ and (16) implies I. In the following, we
I(A; B|CD) ≥ 0 (8) will prove that no other equalities can be implied by ∆ and
(16).
I(A; D|C) ≥ 0 (9) Suppose an elemental equality I(U ; V |W) = 0 is implied
implies by equality (16) and inequalities δ1 , . . . , δ|∆| ∈ ∆. Then by
Lemma 1,
I(A; D|BC) ≥ 0. (10) X
I(U ; V |W) = cI(A; B|C) − cj δj
Similarly, the following constraint j∈{1,...,|∆|}

I(A; B|C) = 0 for some real number c and nonnegative numbers cj for j ∈
I(A; B|CD) ≥ 0 {1, . . . , |∆|}. Consequently,
I(B; D|C) ≥ 0
X
I(U ; V |W) + ci δj = cI(A; B|Ci ). (18)
implies j∈{1,...,|∆|}

I(B; D|AC) ≥ 0. As the LHS of (18) is a non-negative sum of the elemental


Proof: Using the chain rule, it is straightforward to prove terms, this implies that c > 0. Normalising both side of (18)
that with a scaling factor 1/c, we have
I(U ; V |W) X cj δj
I(A; BD|C) = I(A; B|C) + I(A; D|CB) (11) I(A; B|Ci ) = + .
c c
j∈{1,...,|∆|}
and
Using the same technique as in the proof for [8, Theorem 5],
I(A; BD|C) = I(A; D|C) + I(A; B|CD). (12) we can prove that

269
Authorized licensed use limited to: Centrale Supelec. Downloaded on November 07,2024 at 09:50:44 UTC from IEEE Xplore. Restrictions apply.
2012 IEEE Information Theory Workshop

1) Either U ∈ A and V ∈ B or U ∈ B and V ∈ A, functional dependence and conditional independence for set
2) W \ ((A ∪ B) \ {U, V }) ⊆ C. In other words, there exists of random variables V. Specifically, set K does not return
D ⊆ (A ∪ B) \ {U, V } such that W = C ∪ D. those elemental inequalities which are proved to be redundant
Consequently, the equality I(U ; V |W) = 0 belongs to I, and in Theorem 1. Algorithm 1, Decompose(J, V) is used as a
the lemma is proved. subroutine (function) in the algorithm.
Using the same approach, we also have the following
lemma. Algorithm 2 ReducedAxioms(J , V)
Lemma 4 (Deriving new equalities (2)): The set of ele- Require: J = {J1 , ..., Jn }, V
mental inequalities ∆, together with the equality K←∅
for all J ∈ J do
J = H(A|C) = 0 I ← Decompose(J, V)
maximally implies the following set of elemental equalities: K ←K∪I
end for
A ∈ A,
( )
H(A|V \ {A}) = 0 for all A ∈ V do
I= : B ∈ V \ {A} ∪ C, if {H(A|V \ A) = 0} 6∈ K then
I(A; B|DC) = 0
D ⊆ V \ {A, B} ∪ C K ← K ∪ {H(A|V \ A) ≥ 0}
The maximality of implied elemental equalities in Lemmas end if
3 and 4 ensures the maximal reduction in elemental inequal- end for
ities by replacing them with elemental equalities when an for all A, B ∈ V do
equality of conditional independence or functional dependence for C = ∅ to V \ {A, B} : C ⊆ V \ {A, B} do
form is given. In practice, replacing linear inequalities by if
linear equalities is advantageous since solving linear inequal- {I(A; B|C) = 0} 6∈ K
ities is computationally much expensive compared to linear
and
equalities.
{I(A; D|C \ {D}) = 0},
IV. A LGORITHMS 6 ∃D ∈ C : {I(A; D|C) ≥ (or =)0}, ∈K
Algorithm 1, Decompose(J, V) returns a set of maximal {I(A; B|C \ {D}) ≥ (or =)0}
elemental identities I for an input identity J of the form of
and
functional dependence or conditional independence for set of
random variables V. The algorithm uses Lemmas 3 and 4. {I(B; D|C \ {D}) = 0},
6 ∃D ∈ C : {I(B; D|C) ≥ (or =)0}, ∈K
Algorithm 1 Decompose(J, V) {I(A; B|C \ {D}) ≥ (or =)0}
Require: J, V then
I←∅ K ← K ∪ {I(A; B|C) ≥ 0}
if J = {H(A|C) = 0} then end if
for all A ∈ A do end for
I ← I ∪ {H(A|V \ {A}) = 0} end for
for all B ∈ V \ {A} ∪ C do Return K
for all D ⊆ V \ {A, B} ∪ C do
I ← I ∪ {I(A; B|DC) = 0}
V. E LEMENTAL I NEQUALITY R EDUCTION
end for
end for Using the results developed in Section III, a significant
end for reduction in polymatroid axioms can be achieved. To give an
Return I idea we give the following lemma.
else if J = {I(A; B|C) = 0} then Lemma 5 (Reduction): Let J = {I(A; B|C) = 0} be a
for all A ∈ A, B ∈ B do given information identity for disjoint sets of random variables
for all D ⊆ A ∪ B \ {A, B} do A, B, C ⊂ V, |V| = n. Then, there are
I ← I ∪ {I(A; B|DC) = 0} |A||B|2|A|+|B|−2 (19)
end for
end for many maximally implied elemental equalities and at least
Return I |A|−1  
X |A| − 1
end if |A||B| (|A| − 1 − i + |V \ A ∪ B ∪ C|)
i=0
i
Algorithm 2, ReducedAxioms(J , V) returns a compact set |B|−1 
!
X |B| − 1
K of polymatroid axioms (elemental identities and elemental + (|B| − 1 − i + |V \ A ∪ B ∪ C|) .
i
inequalities) for input set of identities J of the form of i=0

270
Authorized licensed use limited to: Centrale Supelec. Downloaded on November 07,2024 at 09:50:44 UTC from IEEE Xplore. Restrictions apply.
2012 IEEE Information Theory Workshop

many redundant elemental inequalities. Algorithm 3 ReducedAxiomsMRF(G = (XV , E))


Proof: The number of maximal elemental equalities fol- Require: MRF G = (XV , E)
lows from (17). Now, by Lemma 3 and Theorem 1, given K←∅
J = {I(A; B|C) = 0} the redundant inequalities are for all XA ∈ XV do
K ← K ∪ {H(A|V \ A) ≥ 0}
I(A; E|BCD) ≥ 0, E ∈ V \ {A} ∪ B ∪ C ∪ D (20)
end for
I(B; E|ACD) ≥ 0, E ∈ V \ {B} ∪ A ∪ C ∪ D (21) for all XA , XB ∈ XV do
where, A ∈ A, B ∈ B, D ⊆ A ∪ B \ {A, B}. Then, from (20) for XC = X∅ to XV\{A,B} : C ⊆ V \ {A, B} do
and (21), the number of redundant elemental inequalities is if XA ⊥ XB |XC in MRF then
K ← K ∪ {I(A; B|C) = 0}
|A|−1   else if
X |A| − 1
|A||B| (|A| − 1 − i + |V \ A ∪ B ∪ C|) {I(A; D|C \ {D}) = 0},
i=0
i
6 ∃D ∈ C : {I(A; D|C) ≥ (or =)0}, ∈K
|B|−1 
!
X |B| − 1 {I(A; B|C \ {D}) ≥ (or =)0}
+ (|B| − 1 − i + |V \ A ∪ B ∪ C|)
i=0
i and
{I(B; D|C \ {D}) = 0},
Note that, the number of redundant inequalities in the 6 ∃D ∈ C : {I(B; D|C) ≥ (or =)0}, ∈K
lemma, given I(A; B|C) = 0, increases exponentially with |A| {I(A; B|C \ {D}) ≥ (or =)0}
and |B|. then
VI. A PPLICATION TO M ARKOV R ANDOM F IELDS K ← K ∪ {I(A; B|C) ≥ 0}
end if
In this section we elaborate application of our main results
end for
to generate compact representation of polymatroid axioms
end for
for random variables representable by a Markov random
Return K
field (MRF). The ideas can be easily extended for Bayesian
networks.
Definition 4 (Markov random field): Let G = (XV , E) be polymatroid axioms when arbitrary conditional independence
an undirected graph and V be a set of random variables such and functional dependence constraints are given. We also
that random variables A : A ∈ V correspond to vertices XA : give algorithms to directly produce equalities and inequalities
A ∈ V of G = (XV , E). The random variables A : A ∈ V compactly representing the minimal set of polymatroid axioms
form a Markov random field with respect to G = (XV , E) if given the constraints. The ideas developed are also applicable
they satisfy the following property. for random variables representable by graphical models.
Global Markov property: Any two subsets of random vari-
ables A and B are conditionally independent given a subset ACKNOWLEDGEMENT
C if every path from a node in XA to a node in XB passes This work was supported in part by the Australian Gov-
through XC in G = (XV , E), denoted ernment under ARC grant DP0880223 and a grant from
the University Grants Committee of the Hong Kong Special
XA ⊥ XB |XC .
Administrative Region, China (Project No. AoE/E-02/08).
Note that the global Markov property suggest a simple R EFERENCES
graphical procedure to test conditional independencies for
[1] S. Thakor, A. Grant and T. Chan, “On complexity reduction of the
random variables representable by MRF. Utilizing the global LP bound computation and related problems,” in 2011 International
Markov property we can test whether conditional independen- Symposium on Network Coding, Beijing, China, pp. 1–6, Jul. 2011.
cies of elemental form holds or not for the random variables. [2] R. W. Yeung, Information Theory and Network Coding (1 ed.). Springer
Publishing Company, Incorporated, 2008.
Depending on the topology of G = (XV , E), there may be ex- [3] S. Thakor, A. Grant, and T. Chan, “Network coding capacity: A
ponential number of conditional independencies of elemental functional dependence bound,” in IEEE International Symposium on
form. Algorithm 3, ReducedAxiomsMRF(G = (XV , E)) de- Information Theory, Seoul, Korea, pp. 263–267, Jun. 2009.
[4] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of
signed in the same spirit as Algorithm 2, directly generates the Plausible Inference, Morgan Kaufmann Publishers Inc., San Francisco,
compact representation of polymatroid axioms using Theorem CA, USA, 1988.
1 for random variables forming an MRF. [5] R.W. Yeung, “A framework for linear information inequalities,” Infor-
mation Theory, IEEE Transactions on,vol. 43, pp. 1924–1934, 1997.
[6] A.P. Dawid, “Conditional independence in statistical theory,” Siam
VII. C ONCLUSION Journal on Computing, 1979.
The exponential increase in number of polymatroid axioms [7] A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency (Al-
gorithms and Combinatorics), Springer-Verlag, Berlin, Germany 2003.
with number of random variables prohibits their use for many [8] L. Guille, T. Chan, and A. Grant, “The minimal set of Ingleton
practical problems involving large number of random vari- inequalities,” EEE Trans. Inform. Theory, vol. 54, pp. 1849–1864, Apr.
ables. We give a compact characterization of the minimal set of 2011.

271
Authorized licensed use limited to: Centrale Supelec. Downloaded on November 07,2024 at 09:50:44 UTC from IEEE Xplore. Restrictions apply.

You might also like