0% found this document useful (0 votes)

12 views

Shannon Entropy

Uploaded by

Anurag Badetia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Shannon Entropy

Uploaded by

Anurag Badetia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Topic-15

Shannon Entropy
Dipan Kumar Ghosh
Indian Institute of Technology Bombay,
Powai, Mumbai 400076
April 15, 2017

1 Introduction
In the last lecture we introduced the concept of information. We discussed a method
of quantifying information and found that unlike the colloquial usage of the term ‘in-
ormation’, the word in technical terms implies a measure of the uncertainty in a given
statement or a given situation. It was pointed out that when an event actually takes place
out of various possibilities that could arise before the event, the amount of uncertainty
that gets removed is a measure of the information associated with that event. We defined
a function H(p1 , p2 , ...pM ) as a measure of such information where there are M possi-
bilities associated with that event with probabilities p1 , p2 , ...pM corresponding to these
events. We also defined an auxiliary function f (M ) as equal to H where probability of
each of the M events are identical. We found that f (M ) must satisfy certain properties,
which are as follows:
1 1 1
1. f (M ) = H( , , . . . , ) is a non-negative, monotonic and continuously increas-
M M M
ing function of M .

2. f (1) = 0, This is because, if an event is certain then there is no uncertainty.

3. f (M N ) = f (M ) + f (N )

4. The grouping theorem as discussed in the previous lecture is satisfied.

In the following we will find the explicit form of a function which satisfies the above.
We will now find a function which satisfies the above. which satisfies the four properties
mentioned above.
c D. K. Ghosh, IIT Bombay 2

2 Information Measure
We claim that the function f (M ) = C log M where C > 0 is the function which satisfies
the four properties mentioned above.

1. f (M 2 ) = f (M × M ) = f (M ) + f (M ) = 2f (M ). In a similar way one can show that

f (M k ) = kf (M ). We also have,

f (M ) = f ((M 1/n )n ) = nf (M 1/n )

1
which gives f (M 1/n ) = f (M ) and also f (M l/n ) = lf (M ). By continuity, it then
n
follows that for any real number a, f (M a ) = af (M ). This is obviously satisfied by
C log M .

2. f (1) = f (1 × 1) = f (1) + f (1) = 2f (1) so that f (1) = 0. Since log 1 = 0, this

property is satisfied.

3. Let M > 1. Let r be an arbitrary positive integer. For any integral M , we can then
find an integer k such that M k ≤ 2r ≤ M k+1 . (Example, let M = 4 and r = 3,
then 2r = 8 which lies between 4 = 41 and 16 = 42 , so that k = 1. Since f (M ) is a
monotonic function of M , it then follows that

f (M k ) ≤ f (2r ) ≤ f (M k+1 )

kf (M ) ≤ rf (2) ≤ (k + 1)f (M )
k f (2) k+1
≤ ≤
r f (M ) r
Consider now the function C log M . Since

log M k ≤ log 2r ≤ log M k+1 ,

we have
k log 2 k+1
≤ ≤
r log M r
Thus both f (2)/f (M ) and log 2/ log M lie between k/r and (k+1)/r. Clearly, the distance
between them on the real line must be less than 1/r. Since r is arbitrary, we can make it
indefinitely large and in this limit

log 2 f (2)
=
log M f (M )

which shows that

f (M ) = C log M
c D. K. Ghosh, IIT Bombay 3

where C = f (2)/f (M ) > 0.

Finally, we need to prove that this form satisfies the grouping theorem. We have from
grouping theorem
r M
! r
X X X p1 p2 pr
H(p1 , p2 , . . . , pM ) − H pi , pi = pi × H Pr , Pr , . . . Pr
i=1 i=r+1 i=1 i=1 p i i=1 p i i=1 pi

M
!
X p p pr
+ pi × H Prr+1 , PM r+2 , . . . PM
i=r+1 i=1 pi i=r+1 pr+1 i=r+1 pi

Consider a total of s events each having the same probability and r of them in group
A and s?r of them in group B. We can then write, using pi = 1/s for each of the events,

1 1 1 r s−r r 1 1 1 s−r 1 1 1
H , ,... −H , = H , ,... + H , ,...
s s s s s s r r r s s−r s−r s−r

where, in the above expression there are r arguments of H in the first term to the
right and s − r arguments in the second term. Using the definition of f (m), this gives

r s−r r s−r
f (s) = H , + f (r) + f (s − r)
s s s s
Substituting f (M ) = C log(M ),

C log s = H(p, 1 − −p) + cp log r + c(1 − p) log(s − r)

which gives

H(p, 1 − p) = −C [p log r + (1 − p) log(s − r) − log s]

= −C [p log r − p log s + p log s − log s + (1 − p) log(s − r)]
h r i
= −C p log − (1 − p) log s + (1 − p) log(s − r)
s
= −C[p log p + (1 − p) log(1 − p)]

We generalize the above to more than two events and assert that
M
X
H({pi }) = −C pi log pi
i=1

In the above we have proved this for M = 1 and for M = 2. We can use the method of
induction to prove that if the theorem is valid for M − 1, it would be true for M . Dividing
M events into two groups, one containing a single event and the other M − 1 events, we
have,
c D. K. Ghosh, IIT Bombay 4
H(p,1−p)

p
0 1/2 1

Figure 1: Variation of uncertainty with probability for two events

H(p1 , p2 , . . . , pM −1 , pM ) = H(p1 + p2 + . . . + pM −1 , pM ) + (p1 + p2 + . . . + pM −1 )

!
p1 p2 pM −1
× H PM −1 + PM −1 + . . . PM −1 + pM H(1)
i=1 p i i=1 p i i=1 p i

= −C[(p1 + p2 + . . . + pM −1 ) log(p1 + p2 + . . . + pM −1 ) + pM log pM ]

−1
M
"M −1 !#
X X pi pi
−( pi )C PM −1 log PM −1 + pM × 0
i=1 i=1 j=1 p j j=1 p j
"M −1 M −1
# " M −1 M −1 M −1
#
X X X X X
= −C pi log( pi ) + pM log pM − C ( pi ) log pi − ( pi ) log( pi )
i=1 i=1 i=1 i−1 i=1
M
X
=C pi log pi
i=1

We will take C = 1 and the base of the logarithm to be 2. The above shows that the
uncertainty associated with an event does not depend on the values that X takes but on
the probability of occurrence of the events. Consider tossing of a coin. According to what
we have shown above, since the head and the tail occur with a probability 1/2 each, the
uncertainty associated with a coin toss is
1 1 X 1 1 1
H( , ) = − pi log2 pi = − log2 (1/2) − (1 − ) log2 (1 − ) = 1
2 2 i
2 2 2
The uncertainty has its maximum value (1 bit) at phead = ptail = 1/2. If the coin is
biased, the uncertainty decreases because we become more certain on which way the coin
is likely to face (Figure 2).
There are several interpretation of the concept of uncertainty measure.
P
1. The relation H({pi }) = − i pi log2 pi is the weighted average of probabilities of
occurrence of various values of a random variable W (X) which assumes the value
c D. K. Ghosh, IIT Bombay 5

x1 (two questions)
Yes
Is it x1 ?
No
Yes x2 (two questions)

Is it either x1 or x2 ?
x3 (two questions)
No Yes
Is it x3 ? x4 (three questions)
Yes
No

Is it x4 ?
No x5 (three questions)

Figure 2: A Decision tree for the number of questions.

− log2 pi when the random variable X takes the value xi , i.e. W takes the value
equal to the negative logarithm of the probability of X = xi
Example : Suppose X takes five values x1 , x2 , x3 , x4 and x5 with probabilities 0.3,
0.2, 0.2, 0.15 and 0.15 respectively. W takes values log2 (0.3) = 1.736, log2 (0.2) =
2.322, 2.322, − log2 (0.15) = 2.737 and 2.737 respectively with the corresponding
probabilities. Adding the contributions, we get H = 2.27 bits of uncertainty.

2. Another interpretation is to regard the uncertainty as the minimum of the number

questions(having answer in the form of yes or no) per event that can be asked to
reveal the result (i.e. remove the uncertainty). Taking the same example as above,
we can look at the decision tree (Figure 3).
the average number of questions that one needs to ask as per the decision tree above
is 2 × (0.3 + 0.3 + 0.2) + 3 × (0.15 + 0.15) = 2.3 which is greater than the minimum
number 2.27 stated above.

Flipping a coin once gives 1 bit of information. Flipping a coin n times (which is the
same as flipping n coins simultaneously) gives n bit of information, because there are 2n
events each with 1/2n probability.
1
H = −2n × log2 (1/2)n = n
2n
The above can easily be generalized to the case of a continuous variable and we have
in that case
Z
H(P ) = P (x) × log(1/P (x))dx

Gibb’s Inequality
It can be seen from Figure 4 that log(x) ≤ x − 1 (This is valid for any base of the
logarithm). The slope of log x being 1/x, its value at x = 1 is 1 so that the tangent to
log x at x = 1 is 1. Further, the tangent line passes through the point x = 1 where its vale
c D. K. Ghosh, IIT Bombay 6

Figure 3: plot of log(x) (red) and x-1 (violet) against x

is log 1 = 0. Thus the tangent line is y = x − 1. The equality log x = x − 1 is applicable

only at x = 1.
Suppose we have two probability distribution P (x) = {p1 , p2 , ..., pn } and Q(x) =
P P
{q1 , q2 , ..., qn }, subject to i pi = i qi = 1. Using the above inequality, we can write
X X
X qi qi
pi log ≤ pi −1 = (pi − qi ) = 0
i
pi i
pi i

the equality is satisfied if for every i, pi = qi . This is known as Gibb’s inequality. We

can use Gibb’s inequality to obtain a bound on H(P ) and also examine what probability
distribution maximizes the “entropy” H. Consider the difference H(P ) − log(n). We
have,

X 1 X
H(P ) − log(n) = pi log( ) − log(n) pi
i
p i
X i
1 1
= pi log( ) − log( )
i
pi n

X 1/n
= pi log ≤0
i
p i

where we have used Gibb’s inequality in the last step. We have considered P = p1 , p2 , ..., pn
and Q = 1/n, 1/n, ..., 1/n, i.e. Q is a distribution where each of the n events has the same
probability 1/n. Thus we have, for the function H(P )

0 ≤ H(P ) ≤ log(n)
c D. K. Ghosh, IIT Bombay 7

H(P ) can be zero only when one of the pi is 1 and the rest are zero while it assumes its
maximum value when the distribution is uniform.

3 Is entropy an appropriate name?

In statistical mechanics, the concept of entropy is introduced to explain macroscopic
properties of a system from its microscopic counterpart. In order to understand the
relationship between this entropy and the one introduced by Shannon, let us look at
Boltzmann approach to entropy, which was introduced in the context of calculation of
energy of an assembly of gas. Suppose, we have N number of particles in a phase space
of given volume. Let us divide the phase space into L number of identical, smaller cells.
A microstate of the system is described by a string a1 , a2 , ..., aN ,where the particle 1 is in
the cell a1 , 2 in cell a2 etc. If more than one particle reside in the same cell, some of the
alphabets in the string are repeated. Boltzmann entropy is given by S = kB ln W , which
we will simply write as log W and the constant can be absorbed by simply changing the
base of the logarithm. W is the number of microstates consistent with a given macrostate.
If there are ni number of particles in the i − th cell, W is given by
N!
W =
n1 !n2 ! . . . nL !
P
subject to i ni = N . Taking logarithm of both sides, we get, using Sterling approxima-
tion,
L
X
ln W = ln N ! − ln ni !
i=1
L
X
= (N ln N − N ) − (ni ln ni − ni )
i=1
L
X
= N ln N − ni ln ni
i=1

The probability of finding a specific particle in the i − th cell is pi = ni /N . In terms of

this we can write Boltzmann entropy as
L
X
ln W = N ln N − N pi ln(N pi )
i=1
L
X L
X
= N ln N − N pi ln N − N pi ln pi
i=1 i=1
L L
X X 1
= −N pi ln pi = N pi ln
i=1 i=1
pi
c D. K. Ghosh, IIT Bombay 8

The average entropy is given by

S X 1
= pi ln
N i
pi

Let us consider some special distribution.

1. Consider the case where all particles are in a single box i.e. pi = 1 for a particular
box and all other probabilities are zero. Clearly the entropy in this case is zero.
The number of configurations is the same as the number of boxes, viz. L.

2. Consider the case where particles are distributed equally in two specific boxes. The
number of different configurations is found by choosing two boxes out of L (we take
L = 106 ) and put half of the particles in one of the boxes and the other half in the
second box. This gives

106 106 ! 106 (106 − 1) 1012

C2 = 6
= ' = 5 × 1011
2!(10 − 2)! 2 2

Since the probability of a particle being in either box is 1/2, the entropy of this
configuration is (1/2) ln 2 + (1/2) ln 2 = ln 2. The entropy is somewhat higher than
the case where the particles are all in one single box. The number of configurations
in the single box case is 106 while in the case of two boxes, it is 5 × 1011 . Thus if we
started with a zero entropy situation (and if these two situations were the only ones
5 × 1011
possible) then, the possibility that the entropy becomes ln 2 is '
5 × 1011 + 106
1 − 10−5 . This is simply a statement of the fact that the system equlibriate to a
state of maximum entropy.

4 Communication System
A typical communication system consists of a source which emits signals, an encoder,
which provides a symbolic representation to the message using the bits generated by the
source, a channel for transmission, such as an optical fiber, which on the way may pick
up stray noise which will attempt to deteriorate the signal, a receiver which will intercept
the message and finally a decoder. A channel?s information capacity is defined as the rate
(say, in Kbps) of user information that can be carried over a noisy channel with as small
error as possible. This is less than the raw channel capacity, which is the capacity in the
absence of any noise. Suppose we wish to code the letters A, C, G, T by a two bit code.
Assume that the letter A appears with 40% frequency, C with 30%, G and T with 15%
each. If we code A=00, C=01, G=10 and T=11, we have on an average 2 bits of code per
letter. However, consider a new scheme where we code A=0, C=10, G=110 and T=111.
The number of bits per letter (on an average) is 0.4 × 1 + 0.3 × 2 + 0.15 × 3 + 0.15 × 3 =
c D. K. Ghosh, IIT Bombay 9

Channel
Source Encoder Receiver Decoder

Noise

Figure 4: Schematic representation of a communication system

1.9 per letter which is a small saving over the previous one, but a saving nevertheless.
The entropy associated with the code (which is the optimal compression possible) is
P
− i pi log pi = −0.4 log(0.4) − 0.3 log(0.3) − 0.15 log(0.15) − 0.15 log(0.15) = 1.871. This
does not tell us how to construct codes but gives an idea of the optimal compression.
Shannon’s Noiseless Coding theorem, which is applicable for all uniquely deci-
pherable codes, provides a limit for the average length of a code which can be carried
with high degree of fidelity over a noiseless channel. We will prove the theorem for the
special case of “prefix code” ,in which no code word is a prefix for another code word.
The following example illustrates a prefix code.
A=0
B=1
C= 00
D= 11
This is not a uniquely decipherable code. The following is an example of an uniquely
decipherable code but is not a prefix code.
word code comments
A 0
B 01 A is a prefix of B
C 011 B is a prefix of C
D 0111 C is a prefix of D
The following two are valid prefix codes.
A 00 A 0
B 01 B 10
C 10 C 110
D 11 D 111
A prefix code is best illustrated through a tree diagram which hangs upside down from
a node. From the node we take one step left if the code is 0 and one step right if the code
is 1. When the code terminates at a word (letter), we have a ‘leaf’. Take the following
illustration for coding the word “QUANTUM” with the following prefix coding.
c D. K. Ghosh, IIT Bombay 10

0 1

A
0 1

1 0 1
0
M
U
Q
0 1

T N

Figure 5: Binary tree to code the word “QUANTUM”

word code
A 0
M 01
N 011
U 0111
Q 100
T 1010
The word “QUANTUM” will then be coded as 100 110 0 1011 1010 110 111 which has
21 bits against 56 bits required to code it by using a byte for every letter. This gives a
compression of 37.5%. The tree is as follows:
If the i-th code word is a leaf at a depth ni , the length of the code word is ni itself. If
nk is the depth of the tree,we have nk ≥ nk?1 ≥ . . . n1 . Maximum number of leaves appear
in the tree when the only terminal points of the tree are at level k. If there is a leaf r at
1
the level i it removes a fraction n of leaves from the level k, leaving 2nk ?ni number of
2 k
leaves. Thus we have
k k
X
nk −ni nk
X 1
2 ≤2 =⇒ ≤1
i=1 i=1
2ni
The last relation is known as the “Kraft Inequality”. If a set of integers n1 , n2 , . . . , nk
satisfies the Kraft inequality, it is both a necessary and a sufficient condition for the ex-
istence of a prefix code of lengths equal to these set of numbers.
Shannon’s Theorem
Given a source with alphabet {a1 , a2 , . . . , ak } which occur with probabilities {p1 , p2 , . . . , pk }
and entropy H(X) = − ki=1 pi log pi , the average length of a uniquely decipherable code
P
c D. K. Ghosh, IIT Bombay 11

is
X
n̄ ≥ H(x), i.e. pi ni ≥ H
i
Proof:

X X
H − n̄ = − pi log pi − pi ni
i i

X 1
= pi log − ni
i
pi

X 1 −ni
= pi log + log 2
i
pi
X 2−ni
= pi log
i
pi
2−ni
X X
≤ pi −1 = 2−ni − 1 ≤ 0
i
pi i

Example :
There are two coins of which one is a fair coin while the other has heads on both sides.
A coin is selected at random and tossed twice. If the tosses result in two heads, what
information does one get regarding the coin that was selected to begin with? Let X be a
random variable which takes value 0 if the coin chosen is a fair coin and takes value 1 for
the biased coin. Let Y be the number of heads. H(X) is the initial uncertainty regarding
the selected coin (which is a one bit uncertainty). The uncertainty remaining when the
number of heads is revealed is H(X|Y ). The information conveyed about the value of X
by revealing Y is then given by I(X|Y ) = H(X)H(X|Y ). Note that if the value of Y is
zero or 1, there is no uncertainty remaining because the coin must then be a fair coin. If
the coin is fair, the probability that Y = 2 is (1/2) × (1/4) = 1/8. If the coin is biased,
the probability that Y = 2 is (1/2) × 1 = 1/2. (In both cases 1/2 is the probability that a
coin is selected). Thus the probability of getting Y = 2 is 1/8 + 1/2 = 5/8. We now need
to multiply this with the entropy associated with the process. Using Bay’ ?s theorem, we
have

P (2|X)P (X)
P (X|Y = 2) =
P (2)
Using the above probability, we can see that given that Y = 2, the probability of X =
0 is 1/5 while the corresponding probability for X = 1 is 4/5. We then have

5 4 5 1
H(X|Y ) = log + log 5 = 0.45
8 5 4 5
Thus the information conveyed about X is 0.55.

Probability Theory - R.S.varadhan
No ratings yet
Probability Theory - R.S.varadhan
225 pages
1 - Modul - Three Feet From Gold Mentoring
100% (1)
1 - Modul - Three Feet From Gold Mentoring
46 pages
(Courant Lecture Notes in Mathematics 7) S. R. S. Varadhan-Probability Theory-Courant Institute of Mathematical Sciences - American Mathematical Society (2001)
0% (1)
(Courant Lecture Notes in Mathematics 7) S. R. S. Varadhan-Probability Theory-Courant Institute of Mathematical Sciences - American Mathematical Society (2001)
227 pages
MTH2222 Mathematics of Uncertainty
No ratings yet
MTH2222 Mathematics of Uncertainty
96 pages
Persona & Shadow
No ratings yet
Persona & Shadow
5 pages
2015 - Bratman Et Al - The Benefits of Nature Experience - Improved Affect and Cognition
No ratings yet
2015 - Bratman Et Al - The Benefits of Nature Experience - Improved Affect and Cognition
10 pages
Lecture 5
No ratings yet
Lecture 5
42 pages
1964 On The Amount of Information Concerning An Unknown Parameter in A Sequence of Observations
No ratings yet
1964 On The Amount of Information Concerning An Unknown Parameter in A Sequence of Observations
9 pages
Mathematical Problems and Solutions On Information Theory
No ratings yet
Mathematical Problems and Solutions On Information Theory
28 pages
Discussion Notes 2-6
No ratings yet
Discussion Notes 2-6
3 pages
Prof (1) F P Kelly - Probability
No ratings yet
Prof (1) F P Kelly - Probability
78 pages
An Introduction To Probability Theory
100% (1)
An Introduction To Probability Theory
91 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
MATH/STAT 235A - Probability Theory Lecture Notes, Fall 2011
No ratings yet
MATH/STAT 235A - Probability Theory Lecture Notes, Fall 2011
111 pages
Probability - Basic Ideas and Selected Topics
No ratings yet
Probability - Basic Ideas and Selected Topics
247 pages
Book
No ratings yet
Book
106 pages
No of Flips For First Head
No ratings yet
No of Flips For First Head
8 pages
book (1)
No ratings yet
book (1)
113 pages
Probability Theory - Varadhan
No ratings yet
Probability Theory - Varadhan
225 pages
Practice Question From Papoulis 4th Edition
50% (2)
Practice Question From Papoulis 4th Edition
67 pages
Entropy and Uncertainty
No ratings yet
Entropy and Uncertainty
15 pages
Probc 1
No ratings yet
Probc 1
4 pages
Durett Radon Nikodym Exercsises With Soln
No ratings yet
Durett Radon Nikodym Exercsises With Soln
10 pages
A Light Discussion and Derivation of Entropy
No ratings yet
A Light Discussion and Derivation of Entropy
4 pages
E Omey A New Renewal Type of Sequence June 2011
No ratings yet
E Omey A New Renewal Type of Sequence June 2011
6 pages
Mathematical Viewpoint Applied To The Theoretical Probability Distributions
No ratings yet
Mathematical Viewpoint Applied To The Theoretical Probability Distributions
30 pages
IT-CO-1-EN
No ratings yet
IT-CO-1-EN
26 pages
Maximum-Entropy Probability Distributions Principles, Formalism and Techniques
No ratings yet
Maximum-Entropy Probability Distributions Principles, Formalism and Techniques
30 pages
Uncertain We Are of The Outcome
No ratings yet
Uncertain We Are of The Outcome
14 pages
Notes It
No ratings yet
Notes It
46 pages
Information Theory/ Data Compression Ma 4211: J Urgen Bierbrauer February 28, 2007
No ratings yet
Information Theory/ Data Compression Ma 4211: J Urgen Bierbrauer February 28, 2007
78 pages
15359-2009-lecture25
No ratings yet
15359-2009-lecture25
11 pages
CalTech Lecture On Probability
No ratings yet
CalTech Lecture On Probability
14 pages
Lecture Notes Fall Term 2013
No ratings yet
Lecture Notes Fall Term 2013
40 pages
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
No ratings yet
Instructor: DR - Saleem AL Ashhab Al Ba'At University Mathmatical Class Second Year Master Dgree
13 pages
Cs229 Probability Review
No ratings yet
Cs229 Probability Review
36 pages
01 - Probability Spaces
No ratings yet
01 - Probability Spaces
15 pages
MA-2203: Introduction To Probability and Statistics: Lectures Notes
No ratings yet
MA-2203: Introduction To Probability and Statistics: Lectures Notes
27 pages
Notes
No ratings yet
Notes
69 pages
MAT 283_live-4
No ratings yet
MAT 283_live-4
56 pages
E. Omey A New Sequence
No ratings yet
E. Omey A New Sequence
6 pages
Rota Baclawski Prob Theory 79
No ratings yet
Rota Baclawski Prob Theory 79
467 pages
280 LN Deller PART1 WITH ALL SUPPLEMENTS Fall2015 PDF
No ratings yet
280 LN Deller PART1 WITH ALL SUPPLEMENTS Fall2015 PDF
118 pages
Instant Download Probability Theory An Analytic View Second Edition Daniel W. Stroock PDF All Chapters
100% (11)
Instant Download Probability Theory An Analytic View Second Edition Daniel W. Stroock PDF All Chapters
67 pages
Buy ebook Probability Theory An Analytic View Second Edition Daniel W. Stroock cheap price
100% (1)
Buy ebook Probability Theory An Analytic View Second Edition Daniel W. Stroock cheap price
77 pages
Lec38 - 210108071 - AKSHAY KUMAR JHA
No ratings yet
Lec38 - 210108071 - AKSHAY KUMAR JHA
12 pages
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 1 30
No ratings yet
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 1 30
30 pages
Stochastic Dynamics
No ratings yet
Stochastic Dynamics
78 pages
Indian Institute of Technology Bombay
No ratings yet
Indian Institute of Technology Bombay
6 pages
Quantum Information: Stephen M. Barnett
No ratings yet
Quantum Information: Stephen M. Barnett
60 pages
Solutions
No ratings yet
Solutions
32 pages
final_soln
No ratings yet
final_soln
5 pages
Lecture_Notes_MAI
No ratings yet
Lecture_Notes_MAI
114 pages
Topics In Probability 1st Edition Narahari Prabhu 2024 scribd download
100% (7)
Topics In Probability 1st Edition Narahari Prabhu 2024 scribd download
67 pages
IT_w1
No ratings yet
IT_w1
20 pages
L01
No ratings yet
L01
5 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
RSA Algorithm
No ratings yet
RSA Algorithm
14 pages
final_aeromgnetic and radiometric data interpretation of chirano_mainEDITED_fr submit_22-09-2015
No ratings yet
final_aeromgnetic and radiometric data interpretation of chirano_mainEDITED_fr submit_22-09-2015
112 pages
s Hors Algorithm
No ratings yet
s Hors Algorithm
9 pages
Von Neumann Entropy
No ratings yet
Von Neumann Entropy
3 pages
3. MCQ Adverb Ajective
No ratings yet
3. MCQ Adverb Ajective
29 pages
Simon Problem
No ratings yet
Simon Problem
6 pages
Super Dense Coding
No ratings yet
Super Dense Coding
5 pages
1
No ratings yet
1
72 pages
2.2 Adverb
No ratings yet
2.2 Adverb
23 pages
1. Adjective
No ratings yet
1. Adjective
17 pages
2.1 Adverb
No ratings yet
2.1 Adverb
23 pages
11F1F94A8B714F0D9317CB93A89BFE2E (1)
No ratings yet
11F1F94A8B714F0D9317CB93A89BFE2E (1)
2 pages
E988DD49B71442DDA0F27BD63C5DF4D2
No ratings yet
E988DD49B71442DDA0F27BD63C5DF4D2
2 pages
D5030DB91B45434697BD6E0075B59E87
No ratings yet
D5030DB91B45434697BD6E0075B59E87
1 page
Midterm2111-F10c Solution
No ratings yet
Midterm2111-F10c Solution
4 pages
ps9 2010c Solution
No ratings yet
ps9 2010c Solution
3 pages
Chapter 3
No ratings yet
Chapter 3
4 pages
81D2B62F9FFD4BCE96157E5D39D21AC3
No ratings yet
81D2B62F9FFD4BCE96157E5D39D21AC3
1 page
6B760993B6CA4D18B9BC0426357168DA
No ratings yet
6B760993B6CA4D18B9BC0426357168DA
2 pages
10.1351 Pac196511030297
No ratings yet
10.1351 Pac196511030297
13 pages
Dynamics of Engineering Systems ENG432: W10 - Exam Preparations
No ratings yet
Dynamics of Engineering Systems ENG432: W10 - Exam Preparations
24 pages
Kraepelin On Alcohol
No ratings yet
Kraepelin On Alcohol
17 pages
Boiler Control Systems
No ratings yet
Boiler Control Systems
64 pages
l3 Medical Science Has Been So Successful That People Now Expect Too Much of It
100% (1)
l3 Medical Science Has Been So Successful That People Now Expect Too Much of It
2 pages
Spesifikasi Valve Stainless
No ratings yet
Spesifikasi Valve Stainless
2 pages
Student Guidelines For Capstone Completion
No ratings yet
Student Guidelines For Capstone Completion
23 pages
Polymer Plain Bearings
No ratings yet
Polymer Plain Bearings
269 pages
Growth and Structure of L1 Ordered Fept Films On Gaas (001) : Submitted To: J. Phys.: Condens. Matter
No ratings yet
Growth and Structure of L1 Ordered Fept Films On Gaas (001) : Submitted To: J. Phys.: Condens. Matter
16 pages
Knowing in Community: 10 Critical Success Factors in Building Communities of Practice
No ratings yet
Knowing in Community: 10 Critical Success Factors in Building Communities of Practice
8 pages
Kepemimpinan
No ratings yet
Kepemimpinan
8 pages
Owner's Manual / Manual Del Propietario: Pressure Washer / Lavadora de Presión Model / Modelo 01903
No ratings yet
Owner's Manual / Manual Del Propietario: Pressure Washer / Lavadora de Presión Model / Modelo 01903
36 pages
PDF - Alcatel-Lucent 7210 SAS
No ratings yet
PDF - Alcatel-Lucent 7210 SAS
5 pages
module 1
No ratings yet
module 1
3 pages
SCIENCE 3 PT (4TH QUARTER)
No ratings yet
SCIENCE 3 PT (4TH QUARTER)
4 pages
7 11AccessoryStructures
No ratings yet
7 11AccessoryStructures
2 pages
Psychiatric Nursing Notes by Dr. Fausto
100% (2)
Psychiatric Nursing Notes by Dr. Fausto
377 pages
3CHEB PS4 Group12-4
No ratings yet
3CHEB PS4 Group12-4
11 pages
Multifactor Theory of Intelligence
No ratings yet
Multifactor Theory of Intelligence
5 pages
Math 5: Q3-Module No. 1: Week 1
100% (6)
Math 5: Q3-Module No. 1: Week 1
4 pages
Cycle Test 2 - Portion (2024-25)
No ratings yet
Cycle Test 2 - Portion (2024-25)
2 pages
Wain229 I
No ratings yet
Wain229 I
112 pages
Reading Papyri Writing Ancient History Second Edition Roger S Bagnall instant download
100% (2)
Reading Papyri Writing Ancient History Second Edition Roger S Bagnall instant download
54 pages
West Africa Fertilizer Bulk Blending Manual November 2022 Edition
No ratings yet
West Africa Fertilizer Bulk Blending Manual November 2022 Edition
92 pages
GRE Practice Exams
0% (1)
GRE Practice Exams
5 pages
Rv4 65d r5 Product Specifications
No ratings yet
Rv4 65d r5 Product Specifications
6 pages
Introduction To Regression: George Boorman
No ratings yet
Introduction To Regression: George Boorman
50 pages
Photosynthesis-LP-Grade 11
No ratings yet
Photosynthesis-LP-Grade 11
6 pages

Shannon Entropy

Uploaded by

Shannon Entropy

Uploaded by

Topic-15

2. f (1) = 0, This is because, if an event is certain then there is no uncertainty.

4. The grouping theorem as discussed in the previous lecture is satisfied.

1. f (M 2 ) = f (M × M ) = f (M ) + f (M ) = 2f (M ). In a similar way one can show that

f (M ) = f ((M 1/n )n ) = nf (M 1/n )

2. f (1) = f (1 × 1) = f (1) + f (1) = 2f (1) so that f (1) = 0. Since log 1 = 0, this

log M k ≤ log 2r ≤ log M k+1 ,

which shows that

where C = f (2)/f (M ) > 0.

C log s = H(p, 1 − −p) + cp log r + c(1 − p) log(s − r)

H(p, 1 − p) = −C [p log r + (1 − p) log(s − r) − log s]

Figure 1: Variation of uncertainty with probability for two events

H(p1 , p2 , . . . , pM −1 , pM ) = H(p1 + p2 + . . . + pM −1 , pM ) + (p1 + p2 + . . . + pM −1 )

= −C[(p1 + p2 + . . . + pM −1 ) log(p1 + p2 + . . . + pM −1 ) + pM log pM ]

Figure 2: A Decision tree for the number of questions.

2. Another interpretation is to regard the uncertainty as the minimum of the number

Figure 3: plot of log(x) (red) and x-1 (violet) against x

is log 1 = 0. Thus the tangent line is y = x − 1. The equality log x = x − 1 is applicable

the equality is satisfied if for every i, pi = qi . This is known as Gibb’s inequality. We

3 Is entropy an appropriate name?

The probability of finding a specific particle in the i − th cell is pi = ni /N . In terms of

The average entropy is given by

Let us consider some special distribution.

106 106 ! 106 (106 − 1) 1012

Figure 4: Schematic representation of a communication system

Figure 5: Binary tree to code the word “QUANTUM”

You might also like