0% found this document useful (0 votes)
76 views

Exponential Families: Peter D. Hoff September 26, 2013

This document provides an overview of exponential families. It defines an exponential family as a collection of probability distributions with densities of the form p(x|η) = exp(ηT t(x) - A(η)), where t(x) is a statistic and A(η) ensures the densities integrate to 1. It describes properties such as the natural parameter space H and full vs. curved exponential families. Examples include the normal, multinomial, and curved normal models. Key results discussed are the convexity of H and A(η), continuity of expectations, and formulas for computing moments using derivatives of the log normalization factor A(η).

Uploaded by

Christian Beren
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Exponential Families: Peter D. Hoff September 26, 2013

This document provides an overview of exponential families. It defines an exponential family as a collection of probability distributions with densities of the form p(x|η) = exp(ηT t(x) - A(η)), where t(x) is a statistic and A(η) ensures the densities integrate to 1. It describes properties such as the natural parameter space H and full vs. curved exponential families. Examples include the normal, multinomial, and curved normal models. Key results discussed are the convexity of H and A(η), continuity of expectations, and formulas for computing moments using derivatives of the log normalization factor A(η).

Uploaded by

Christian Beren
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Exponential families

Peter D. Hoff
September 26, 2013

Much of this content comes from Lehmann and Casella [1998] section 1.5.

Contents
1 The canonical exponential family

2 Basic results

The canonical exponential family

Construction of an exponential family of densities


Exponential families are classes of probability measures constructed from
1. a dominating measure
1

2. a statistic t(X)
Let
(X , A) be a measurable space,
be a measure on A,
t : X Rs
For Rs , define the measure
Z

T t(x)

(dx) A A
Z
T
A() = log (X ) = log e t(x) (dx).

(A) =

If A() < , we can define a probability measure P on (X , A) via its density w.r.t. :
T

p(x|) = e t(x)A() , x X
Z
P (A) =
p(x|)(dx).
A

Note that
P (X ) = 1 by construction, and so (X , A, P ) is a probability space.
P is absolutely continuous w.r.t. , with RN density p(x|).
R T
We can construct such a density for each Rs for which e t(x) dx is finite.
Definition 1 (canonical exponential family). Let
(X , A, ) be a measure space,
t : X Rs be an s-dimensional statistic that does not satisfy any linear constraints,
R T
A() = log e t(x) (dx).
A collection of densities given by
, where
{p(x|) = exp( T t(x) A()) : H}
H = { : A() < }
H
is called an s-dimensional exponential family.
2

Notes:
The set H = { : A() < } is called the natural parameter space.
R
Each density p(x|) defines a measure P  via P (A) = A p(x|)(dx).
have a common dominating measure .
We say that the measures {P : H}

Minimal, full and curved exponential families


Doesnt satisfy a linear constraint means
6 a Rs : a 6= 0, aT t(x) = c x X .
Some authors do not include this no linear constraints requirement for the statistic t.
If t does satisfy a linear constraint, the natural parameter space includes points that correspond to the same density and probability distribution. As a result, the parameter will be
non-identifiable (in the natural parameter space):
Definition 2. A model P = {p(x|) : H} for (X , A) is nonidentifiable if there exists
1 , 2 H : 1 6= 2 but P (A|1 ) = P (A|2 ) A A.
Exercise: Show that if t satisfies a linear constraint and H is the parameter space, then the
exponential family model is non-identifiable.
Most authors refer to an EFM where t does not satisfy a linear constraint as a minimal
parametrization. Since a non-minimal representation can always be made minimal, and the
recommendation is always to do so, it seems simplest just to require it in the definition.
Definition 3 (full rank). If the parameter space for an exponential family contains an sdimensional open set, then it is called full rank.
An exponential family that is not full rank is generally called a curved exponential family,
as typically the parameter space is a curve in Rs of dimension less than s.
Examples
Often an exponential family model is parameterized as
P = {p(x|) = h(x) exp{()T t(x) B() : }.
This is done
3

if the parameter is more interpretable than


so that the dominating measure can be something simple.
Example(normal model):
The univariate normal model on (R, B(R)) can be represented with the class of densities
{p(x|, 2 ) : R, 2 R+ } w.r.t. Lebesgue measure, where
p(x|, 2 ) = (2 2 )1/2 exp((x )2 /[2 2 ])
= (2)1/2 exp(x2 21 2 + x 2

2
2 2

12 log 2 ).

This is the same model as p(x|) = (2)1/2 exp( T t(x) A()) where
!
!
2
x
/
t(x) =
, (, 2 ) =
, A() = (2 / 2 + log 2 )/2.
2
2
1/(2 )
x
To reparameterize back, note that = 1 /(22 ) and 2 = 1/(22 ).
What is the natural parameter space?
Does it correspond to (, 2 ) R R+ ?
Recall,
Z

H = {1 , 2 :

e1 x+2 x dx < }

Convince yourself that H = R R , which gives (, 2 ) R R+ .


= H is the normal model.
The exponential family model defined by t(x) = (x, x2 ) and H
The normal model with (, 2 ) R R+ is a two-dimensional full rank exponential family.
Example(a curved normal model):
Consider the normal model having the following mean-variance relationship:
X normal(, 2 ) , R.
Let P = {p(x|, 2 ) : R, 2 = 2 }, where p(x|, 2 ) are the normal densities given above.
The densities in this model can be written
p(x|) = (22 )1/2 exp((x )2 /[22 ])
x exp((x2 2x + 2 )/[22 ])
= exp(x/ x2 /[22 ] 1/2)
x exp(x/ x2 /[22 ])
exp(1 t1 (x) + 2 t2 (x)).
4

Since t(x) = (x, x2 ) doesnt satisfy a linear constraint, this is a two-dimensional exponential
family.
The natural parameter space corresponding to t(x) is R R1 .
Our reduced parameter space is () = (1/, 1/[22 ]).
This is a one-dimensional curve in two-dimensional space.
Draw a picture.
This family is a two-dimensional exponential family (in minimal form).
It is not a full rank exponential family.
Example:(multinomial model)
Let X multinomial(n, ), for which
P
= { Rp : j = 1} and
P
X = {x {0, 1, . . . , n} :
xj = n}.
The density of P w.r.t. counting measure on X is
p(x|) =

n
x

1x1 pxp .

We can rewrite this in canonical exponential form as


p(x|) = exp(x1 1 + xp p ),
where j = log j and the dominating measure is

(x) =

n
x

(x),

i.e. the multinomial coefficient has been absorbed into the dominating measure.
= { Rp : P ej = 1}, which is a p1-dimensional
The parameter space for this model is H
curve in Rp .
Is the multinomial model a p-dimensional curved exponential family?
Note that 1T t(x) = 1 x X , so this family
doesnt satisfy our definition, or if you prefer
is not in minimal form.

Consider the usual parameterization again, but now express the model in terms of t(x) =
(x1 , . . . , xp1 ):
Pp1
 x1
1 pn 1 xj
p1  
xj
 Y
j
n n
= x p
p

p(x|) =

n
x

j=1


n
x

exp(1 x1 + + p1 xp1 A()),

where j = log(j /p ) and A() can be computed as follows:


j = p ej
X
1 p = p
ej
p =

1+

1
P

ej

A() = n log p = n log(1 +

ej )

Thus the multinomial model is a (p 1)-dimensional exponential family generated by the


statistic t(x) = (x1 , . . . , xp1 ).
Does correspond to H?
H = { Rp1 :

exp{1 x1 + + p1 xp1 } < } = Rp1 .

xX

This of contains a p 1-dimensional rectangle, and so the multinomial model is a full rank
p 1-dimensional exponential family.

Basic results

Convexity of H:
The largest EFM based on a statistic t(x) is the one based on the natural parameter space:
{p(x|) : H} since H
H.
{p(x|) : H}
6

The natural parameter space is usually (but not always) open,


making this fullest family also full rank.
It is always the case that H is convex, and that A() is convex on H.
Theorem 1. The natural parameter space H for densities of the form p(x|) = exp( T t(x)
A()) is convex, and A() is convex on H.
Proof. Recall Holders inequality: For a [0, 1], b = 1 a,
Z

Z
fg

1/a

a Z
g

1/b

b

Now let 1 , 2 H and apply the inequality:


Z
Z
T
T
A(a1 +b2 )
T
e
= exp((a1 + b2 ) t(x)) = ea1 t eb2 t
Z
a Z
b
1T t
2T t
e
e

= eaA(1 )+bA(2 ) <


and so a1 + b2 H, and A() is convex.

Continuity, integration and differentiation


The following theorem is useful in a variety of contexts:
Theorem 2 (LC 5.8). For any integrable function f the expected value function E[f |],
Z
E[f |] = f (x) exp( T t(x) A()) (dx),
is, at any in the interior of H,
1. continuous as a function of ,
2. has derivatives w.r.t. of all orders,
3. derivatives can be obtained by differentiating the integrand.
The first item is used in two key results in estimation and testing:
In estimation, the theorem implies that risk function for exponential family models are
continuous. This will help us characterize all admissible estimators for such models.
7

In testing, the theorem implies that the power function for any test is continuous. This
will help us characterize unbiased testing procedures.
An important application of the theorem is the calculation of moments of t.
R
By definition, eA() = et (dx).
Taking derivatives w.r.t. gives

d A()
e
d
0

A()

A ()e

d
d

Z
=
Z

A () =

et (dx)

tet (dx)
tetA() (dx) = E[t(X)|].

More generally,
Theorem 3 (Barndorff-Neilsen(1978) thm 8.1). Let P = {p(x|) = exp( T tA()) : H}
be an exponential family and int H. Then
Z
k
T
A()
e
= tk11 (x) tks s (x)e t(x) (dx)
k1
k1
1 1
k1 , . . . , ks 0.
This result helps us with the moment generating function.
Moment generating function:

Mt (u1 , . . . , up ) = E[eu t |]
Z
T
= e(+u) tA() (dx)
Z
T
A(+u)A()
=e
e(+u) tA(+u) (dx)
= eA(+u)A()
This works as long as is in the interior of H and u is small enough so that + u H.
From this, we can use the above theorem to show
k
Mt (u)|u=0 = E[tk11 tks s |].
1k1 1k1
8

References
E. L. Lehmann and George Casella. Theory of point estimation. Springer Texts in Statistics.
Springer-Verlag, New York, second edition, 1998. ISBN 0-387-98502-6.

You might also like