Exponential Families: Peter D. Hoff September 26, 2013
Exponential Families: Peter D. Hoff September 26, 2013
Peter D. Hoff
September 26, 2013
Much of this content comes from Lehmann and Casella [1998] section 1.5.
Contents
1 The canonical exponential family
2 Basic results
2. a statistic t(X)
Let
(X , A) be a measurable space,
be a measure on A,
t : X Rs
For Rs , define the measure
Z
T t(x)
(dx) A A
Z
T
A() = log (X ) = log e t(x) (dx).
(A) =
If A() < , we can define a probability measure P on (X , A) via its density w.r.t. :
T
p(x|) = e t(x)A() , x X
Z
P (A) =
p(x|)(dx).
A
Note that
P (X ) = 1 by construction, and so (X , A, P ) is a probability space.
P is absolutely continuous w.r.t. , with RN density p(x|).
R T
We can construct such a density for each Rs for which e t(x) dx is finite.
Definition 1 (canonical exponential family). Let
(X , A, ) be a measure space,
t : X Rs be an s-dimensional statistic that does not satisfy any linear constraints,
R T
A() = log e t(x) (dx).
A collection of densities given by
, where
{p(x|) = exp( T t(x) A()) : H}
H = { : A() < }
H
is called an s-dimensional exponential family.
2
Notes:
The set H = { : A() < } is called the natural parameter space.
R
Each density p(x|) defines a measure P via P (A) = A p(x|)(dx).
have a common dominating measure .
We say that the measures {P : H}
2
2 2
12 log 2 ).
This is the same model as p(x|) = (2)1/2 exp( T t(x) A()) where
!
!
2
x
/
t(x) =
, (, 2 ) =
, A() = (2 / 2 + log 2 )/2.
2
2
1/(2 )
x
To reparameterize back, note that = 1 /(22 ) and 2 = 1/(22 ).
What is the natural parameter space?
Does it correspond to (, 2 ) R R+ ?
Recall,
Z
H = {1 , 2 :
e1 x+2 x dx < }
Since t(x) = (x, x2 ) doesnt satisfy a linear constraint, this is a two-dimensional exponential
family.
The natural parameter space corresponding to t(x) is R R1 .
Our reduced parameter space is () = (1/, 1/[22 ]).
This is a one-dimensional curve in two-dimensional space.
Draw a picture.
This family is a two-dimensional exponential family (in minimal form).
It is not a full rank exponential family.
Example:(multinomial model)
Let X multinomial(n, ), for which
P
= { Rp : j = 1} and
P
X = {x {0, 1, . . . , n} :
xj = n}.
The density of P w.r.t. counting measure on X is
p(x|) =
n
x
1x1 pxp .
(x) =
n
x
(x),
i.e. the multinomial coefficient has been absorbed into the dominating measure.
= { Rp : P ej = 1}, which is a p1-dimensional
The parameter space for this model is H
curve in Rp .
Is the multinomial model a p-dimensional curved exponential family?
Note that 1T t(x) = 1 x X , so this family
doesnt satisfy our definition, or if you prefer
is not in minimal form.
Consider the usual parameterization again, but now express the model in terms of t(x) =
(x1 , . . . , xp1 ):
Pp1
x1
1 pn 1 xj
p1
xj
Y
j
n n
= x p
p
p(x|) =
n
x
j=1
n
x
1+
1
P
ej
ej )
xX
This of contains a p 1-dimensional rectangle, and so the multinomial model is a full rank
p 1-dimensional exponential family.
Basic results
Convexity of H:
The largest EFM based on a statistic t(x) is the one based on the natural parameter space:
{p(x|) : H} since H
H.
{p(x|) : H}
6
Z
fg
1/a
a Z
g
1/b
b
In testing, the theorem implies that the power function for any test is continuous. This
will help us characterize unbiased testing procedures.
An important application of the theorem is the calculation of moments of t.
R
By definition, eA() = et (dx).
Taking derivatives w.r.t. gives
d A()
e
d
0
A()
A ()e
d
d
Z
=
Z
A () =
et (dx)
tet (dx)
tetA() (dx) = E[t(X)|].
More generally,
Theorem 3 (Barndorff-Neilsen(1978) thm 8.1). Let P = {p(x|) = exp( T tA()) : H}
be an exponential family and int H. Then
Z
k
T
A()
e
= tk11 (x) tks s (x)e t(x) (dx)
k1
k1
1 1
k1 , . . . , ks 0.
This result helps us with the moment generating function.
Moment generating function:
Mt (u1 , . . . , up ) = E[eu t |]
Z
T
= e(+u) tA() (dx)
Z
T
A(+u)A()
=e
e(+u) tA(+u) (dx)
= eA(+u)A()
This works as long as is in the interior of H and u is small enough so that + u H.
From this, we can use the above theorem to show
k
Mt (u)|u=0 = E[tk11 tks s |].
1k1 1k1
8
References
E. L. Lehmann and George Casella. Theory of point estimation. Springer Texts in Statistics.
Springer-Verlag, New York, second edition, 1998. ISBN 0-387-98502-6.