0% found this document useful (0 votes)
509 views

Entropy of The Gaussian Distribution: Appendix A

Entropy maximizing distributions play an important role in several entropy estimation methods. The Gaussian distribution is the one with maximum entropy from among all the probability distributions which have a finite mean and a finite variance. This can be proved using the Lagrange multipliers. Follows a detailed proof.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
509 views

Entropy of The Gaussian Distribution: Appendix A

Entropy maximizing distributions play an important role in several entropy estimation methods. The Gaussian distribution is the one with maximum entropy from among all the probability distributions which have a finite mean and a finite variance. This can be proved using the Lagrange multipliers. Follows a detailed proof.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Appendix A

Entropy of the Gaussian distribution


From among the distributions which cover the entire real line R and have a nite mean and a nite variance 2 , the maximum entropy distribution is the Gaussian. Following I present the analytical proof, using optimization under constraints. The Lagrange multipliers method oers quite a straightforward way for nding the Gaussian probability distribution. The function to maximize is the Shannon entropy of a pdf f (x): H(f ) =

f (x) log f (x)dx

(A.1)

with respect to f (x), under the constraints that a) f (x) is a pdf, that is, the probabilities of all x sum 1:

f (x)dx = 1,

(A.2)

b) it has a nite mean :


xf (x)dx =

(A.3)

c) and a nite variance 2 :


x2 f (x)dx 2 = 2 95

(A.4)

96

APPENDIX A. ENTROPY OF THE GAUSSIAN DISTRIBUTION The Lagrangian function under these constraints is: (f, 0 , 1 , 2 ) = + 0 + 1 + 2

f (x) log f (x)dx

(A.5) (A.6) (A.7) (A.8)

f (x)dx 1 xf (x)dx x2 f (x)dx 2 2

The critical values of are zero-gradient points, that is, when the partial derivatives of are zero: (f, 0 , 1 , 2 ) f (x) (f, 0 , 1 , 2 ) 0 (f, 0 , 1 , 2 ) 1 (f, 0 , 1 , 2 ) 1 = log f (x) 1 + 0 + 1 x + 2 x2 = 0 (A.9) = = =

f (x)dx 1 = 0 xf (x)dx = 0 x2 f (x)dx 2 2 = 0

(A.10) (A.11) (A.12)

These equations form a system from which 0 , 1 , 2 and, more importantly, f (x), can be calculated. From Eq. A.9: f (x) = e2 x
2 + x+ 1 1 0

(A.13)

Now f (x) can be substituted in Eqs. A.10, A.11, A.12:


e2 x

2 + x+ 1 1 0

dx = 0 dx = dx = 2 + 2

(A.14) (A.15) (A.16)

xe2 x

2 + x+ 1 0

x2 e2 x

2 + x+ 1 1 0

For solving the latter system of three equations with three unknowns (0 , 1 , 2 ), the three improper integrals have to be computed. In 1778 Laplace proved that 2 I= ex dx = . (A.17)

96

B.Bonev This result is obtained by switching to polar coordinates: I2 = = =

ex dx e(x
2

=
2 +y 2 )

ex dx

ey dy

(A.18) (A.19) (A.20) (A.21)

0 2 0

dxdy

er rdrd
0

1 2 = 2 er 2

In Eq. A.18,the dierential dxdy represents an element of area in cartesian coordinates in the xy-plane. This is represented in polar coordinates r, which are given by x = r cos , y = r sin and r2 = x2 + y 2 . The element of area turns into rdrd. The 2 factor in Eq. A.21 comes from the integration over . The integral over r can be calculated by substitution, u = r2 , du = 2rdr. 2 The more general integral of xn eax +bx+c has the following closed form ([Wei98]):
n ax2 +bx+c

x e

dx =

b2 /(4a)+c e a

n/2 k=0

n! (2b)n2k k!(n 2k)! (4a)nk

(A.22)

for integer n > 0, the variables a, b belonging to the punctured plane (the complex plane with the origin 0 removed), and the real part of a being positive. Provided the previous result, the system of equations becomes: 2 /(42 )+0 1 e 1 =1 2 2 /(42 )+0 1 1 e 1 = 2 22 2 /(42 )+0 1 2 1 1 e 1 2 2 2 42 2 By applying the natural logarithm to the equations, log e1 /(42 )+0 1+log 97
2

(A.23) (A.24) = 2 + 2 (A.25)

/2

= log 1,

(A.26)

98

APPENDIX A. ENTROPY OF THE GAUSSIAN DISTRIBUTION

the unknown 0 can be isolated from each equation of the system: 0 = 0 = 0 = 2 1 + 1 log 42 2 1 + 1 log 42 2 1 + 1 log 42 2 22 + log 2 1 2 + 2 + log 2 2 1 1
42 2

(A.27) (A.28) (A.29) (A.30)

22

From which it is deduced that 1 and 2 have the following relation: 0 = log then, 1 = 22 1 2 2 + 2 = 12 42 22 (A.32) (A.33) 2 + 2 22 = log 2 , 1 1 1 2 2 4 2
2

(A.31)

provides the values of 1 and 2 and by subsituting them in any equation of the system, 0 is also obtain. The result is 1 2 0 = 2 log 2 2 + 1 2 1 = 2 1 2 = 2 2

(A.34) (A.35) (A.36)

These solutions can be substituted in the f (x) expression (Eq. A.13): p(x) = e 22 x + 2 x+ 2 2 log 2 1 2 2 1 = e 22 ( 2x+x ) = 2 2 ()2 1 e 22 , = 2 2
1 2 1 2 2

(A.37) (A.38) (A.39)

which, indeed, is the pdf of the Gaussian distribution. Finally, the negative sign of the second derivative of ensures that the obtained solution is a 98

B.Bonev maximum, and not a minimum or an inection point. 1 2 (f, 0 , 1 , 2 ) = 2 f (x) f (x) (A.40)

99

You might also like