Lecture2 1
Lecture2 1
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Lecture 2 Entropy
September 2, 2022
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Outline
1 Self-information
2 Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Outline
1 Self-information
2 Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Theorem
The only function defined over p ∈ [0, 1] and satisfying
I(p) is monotonically decreasing in p;
I(p) is a continuous function of p for 0 ≤ p ≤ 1;
I(p1 × p2 ) = I(p1 ) + I(p2 );
is I(p) = −c · logb (p), where c is a positive constant and the base
b of the logarithm is a real number larger then one.
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Outline
1 Self-information
2 Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Entropy
Definition
The entropy H(X) of a discrete random variable X with
probability mass distribution or probability mass function (pmf)
PX (·) is defined by
∑
H(X) := − PX (x) · log2 PX (x) (bits).
x∈X
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Lemma
H(X) ≥ 0.
Proof.
0 ≤ p(x) ≤ 1 implies that log p(x)
1
≥ 0.
Lemma
Hb (X) = (logb a)Ha (X).
Proof.
logb p = logb a loga p.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Example
Let {
1 with probability p,
X=
0 with probability 1-p.
Then
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
You are given 12 balls, all equal in weight expect for one that is
either heavier or lighter. You are also given a two-pan balance to
use. In each use of the balance you may put any number of the 12
balls on the left pan and the same number on the right pan. There
are three possible outcomes: either the weights are equal, or the
balls on the left are heavier, or the balls on the right are heavier.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
You are given 12 balls, all equal in weight expect for one that is
either heavier or lighter. You are also given a two-pan balance to
use. In each use of the balance you may put any number of the 12
balls on the left pan and the same number on the right pan. There
are three possible outcomes: either the weights are equal, or the
balls on the left are heavier, or the balls on the right are heavier.
Your task is to design a strategy to determine which is the odd ball
and whether it is heavier or lighter than the others in as few uses
of the balance as possible.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
1+
1+ 2+ 5
1 - 2+
1+
2+ 2 R 5
3+ weigh
- - 3+
1+ Æ
4+
5
126
345
A 3+ 4+ 6
3
4 R 4+
A
2+ 6 6
3+ 7 AA
-
7
4+ 8 AU 1
5+
6+
7 8
7 R 8
?
7+
4
8+
R-
3
6+ 3 4
3
9+
10+ 1 4
6+
2
11+ weigh 3 weigh
- 2
B - A -
12+ 1234 4 126 1
1 2 5+
1 5678 5+ 345 2 R 1
2 B 6+ A 5+
3 BB 7+ AA 7+
4
BB 8+ AU 7+ 8+
7 - 8+
5 1 R
6 B ?
7 BB 9+
B R-
8 9
9+ 10+ 11+ 10+
9
BB 9+
10+ 10
11+
-
10
11 BB 11+ weigh
- 10
12 N 12+ 9 10 11 9
9 123 AA 9 10 11
10 R 9
11
10
11 AA 12+
12 AU 12+ 12
12 -
1 R 12
?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Outline
1 Self-information
2 Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Joint entropy
Definition
The joint entropy H(X, Y ) of a pair of discrete random variables
(X, Y ) with a joint distribution p(x, y) is defined as
∑∑
H(X, Y ) = − p(x, y) log p(x, y),
x∈X y∈Y
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Conditional entropy
Definition
If (X, Y ) ∼ p(x, y), the conditional entropy H(Y |X) is defined as
∑
H(Y |X) = p(x)H(Y |X = x)
x∈X
∑ ∑
= − p(x) p(y|x) log p(y|x)
x∈X y∈Y
∑∑
= − p(x, y) log p(y|x)
x∈X y∈Y
= −E log p(Y |X).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Chain rule
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Proof.
∑∑
H(X, Y ) = − p(x, y) log p(x, y)
x∈X y∈Y
∑∑
= − p(x, y) log(p(x)p(y|x))
x∈X y∈Y
∑∑ ∑∑
= − p(x, y) log p(x) − p(x, y) log p(y|x)
x∈X y∈Y x∈X y∈Y
∑ ∑∑
= − p(x) log p(x) − p(x, y) log p(y|x)
x∈X x∈X y∈Y
= H(X) + H(Y |X).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Example
X
1 2 3 4
Y
1 1 1 1
1 8 16 32 32
1 1 1 1
2 16 8 32 32
1 1 1 1
3 16 16 16 16
1
4 4 0 0 0
Let (X, Y ) have the above joint distribution. The marginal
distributions of X and Y are ( 12 , 14 , 18 , 81 ) and ( 14 , 41 , 14 , 41 )
respectively, and hence H(X) = 74 bits, H(Y ) = 2bits. Also
∑
4
11
H(X|Y ) = p(Y = i)H(X|Y = i) = bits.
8
i=1
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Outline
1 Self-information
2 Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Relative entropy
Definition
The relative entropy or Kullback-Leibler distance between two
probability mass functions p(x) and q(x) is defined as
∑ p(x) p(x)
D(p ∥ q) = p(x) log = Ep log .
q(x) q(x)
x∈X
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Mutual information
Definition
Consider two random variables X and Y with a joint probability
mass function p(x, y) and marginal probability mass functions p(x)
and p(y). The mutual information I(X; Y ) is the relative entropy
between the joint distribution and the product distribution
p(x)p(y):
∑ ∑ p(x,y)
I(X; Y ) = x∈X y∈Y p(x, y) log p(x)p(y)
= D(p(x, y) ∥ p(x)p(y))
p(X,Y )
= Ep(x,y) log p(X)p(Y ).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Similarly,
I(X; Y ) = H(Y ) − H(Y |X).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Proposition
The mutual information between a random variable X and itself is
equal to the entropy of X, i.e., I(X; X) = H(X).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Example
Let X = {0, 1}, and consider two distributions p�q on X . Let
p(0) = 1 − r, p(1) = r and q(0) = 1 − s, q(1) = s. Then
1−r r
D(p∥q) = (1 − r) log + r log
1−s s
and
1−s s
D(q∥p) = (1 − s) log + r log .
1−r r
If r = s�then D(p∥q) = D(q∥p) = 0. Note that in general
D(p∥q) ̸= D(q∥p). For example, if r = 12 , s = 41 , then
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Outline
1 Self-information
2 Entropy
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Theorem
Let X1 , X2 , . . . , Xn be drawn according to p(x1 , x2 , . . . , xn ). Then
∑
n
H(X1 , X2 , · · · , Xn ) = H(Xi |Xi−1 , · · · , X1 ).
i=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Definition
The conditional mutual information of random variables X and Y
given Z is defined by
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Theorem
∑
n
I(X1 , X2 , · · · , Xn ; Y ) = I(Xi ; Y |Xi−1 , Xi−2 , · · · , X1 ).
i=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Proof.
I(X1 , X2 , · · · , Xn ; Y )
= H(X1 , X2 , · · · , Xn ) − H(X1 , X2 , · · · , Xn |Y )
∑n ∑n
= H(Xi |Xi−1 , · · · , X1 ) − H(Xi |Xi−1 , · · · , X1 , Y )
i=1 i=1
∑n
= I(Xi ; Y |Xi−1 , Xi−2 , · · · , X1 ).
i=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Definition
For joint probability mass functions p(x, y) and q(x, y), the
conditional relative entropy D(p(y|x)∥q(y|x)) is the average of the
relative entropies between the conditional probability mass function
p(y|x) and q(y|x) averaged over the probability mass function
p(x). More precisely,
∑ ∑ p(y|x)
D(p(y|x)∥q(y|x)) = p(x) p(y|x) log
x y
q(y|x)
p(Y |X)
= Ep(x,y) log .
q(Y |X)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy
Self-information
Entropy
Joint entropy and conditional entropy
Relative entropy and mutual information
Chain rules for entropy, relative entropy and mutual information
Proof.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lecture 2 Entropy