CPTR 36-40
CPTR 36-40
This lecture introduces the notions of moment of a random variable and cross-
moment of a random vector.
36.1 Moments
36.1.1 De…nition of moment
The n-th moment of a random variable is the expected value of its n-th power:
X (n) = E [X n ]
exists and is …nite, then X is said to possess a …nite n-th moment and X (n)
is called the n-th moment of X. If E [X n ] is not well-de…ned, then we say that
X does not possess the n-th moment.
exists and is …nite, then X is said to possess a …nite n-th central moment and
X (n) is called the n-th central moment of X.
36.2 Cross-moments
36.2.1 De…nition of cross-moment
Let X be a K 1 random vector. A cross-moment of X is the expected value of
the product of integer powers of the entries of X:
nK
E [X1n1 X2n2 : : : XK ]
285
286 CHAPTER 36. MOMENTS AND CROSS-MOMENTS
If
X
nK
(n1 ; n2 ; : : : ; nK ) = E [X1n1 X2n2 : : : XK ] (36.2)
Example 205 Let X be a 3 1 discrete random vector and denote its components
by X1 , X2 and X3 . Let the support of X be:
82 3 2 3 2 39
< 1 2 3 =
RX = 4 2 5 ; 4 1 5 ; 4 3 5
: ;
1 3 2
X (1; 2; 1) = E X1 X22 X3
X (1; 2; 1) = E X1 X22 X3
X
= x1 x22 x3 pX (x1 ; x2 ; x3 )
(x1 ;x2 ;x3 )2RX
= 1 22 1 pX (1; 2; 1) + 2 12 3 pX (2; 1; 3)
+3 32 2 pX (3; 3; 2)
1 1 1 64
= 4 +6 + 54 =
3 3 3 3
1 See p. 117.
2 See p. 134.
36.2. CROSS-MOMENTS 287
If " #
K
Y nk
X (n1 ; n2 ; : : : ; nK ) = E (Xk E [Xk ]) (36.4)
k=1
Example 207 Let X be a 3 1 discrete random vector and denote its components
by X1 , X2 and X3 . Let the support of X be:
82 3 2 3 2 39
< 4 2 1 =
RX = 4 2 5 ; 4 1 5 ; 4 3 5
: ;
4 1 2
37.1 De…nition
We start this lecture by giving a de…nition of mgf.
E [exp (tX)]
exists and is …nite for all real numbers t belonging to a closed interval [ h; h] R,
with h > 0, then we say that X possesses a moment generating function and
the function MX : [ h; h] ! R de…ned by
The following example shows how the mgf of an exponential random variable
is derived.
1 See p. 285.
2 See p. 307.
289
290 CHAPTER 37. MGF OF A RANDOM VARIABLE
RX = [0; 1)
exp ( x) if x 2 RX
fX (x) =
0 if x 2
= RX
where: in step A we have assumed that t < , which is necessary for the integral
to be …nite. Therefore, the expected value exists and is …nite for t 2 [ h; h] if h is
such that 0 < h < , and X possesses an mgf
MX (t) =
t
Proposition 210 If a random variable X possesses an mgf MX (t), then, for any
n 2 N, the n-th moment of X, denoted by X (n), exists and is …nite. Furthermore,
dn MX (t)
X (n) = E [X n ] =
dtn t=0
n
where d M X (t)
dtn is the n-th derivative of MX (t) with respect to t, evaluated at
t=0
the point t = 0.
Proof. Proving the above proposition is quite complicated, because a lot of an-
alytical details must be taken care of (see, e.g., Pfei¤er4 - 1978). The intuition,
3 See p. 365.
4 Pfei¤er, P. E. (1978) Concepts of probability theory, Courier Dover Publications.
37.3. DISTRIBUTIONS AND MGFS 291
however, is straightforward: since the expected value is a linear operator and dif-
ferentiation is a linear operation, under appropriate conditions we can di¤erentiate
through the expected value, as follows:
dn MX (t) dn dn
= E [exp (tX)] = E exp (tX) = E [X n exp (tX)]
dtn dtn dtn
dn MX (t)
= E [X n exp (0 X)] = E [X n ] = X (n)
dtn t=0
Example 211 In Example 209 we have demonstrated that the mgf of an exponen-
tial random variable is
MX (t) =
t
The expected value of X can be computed by taking the …rst derivative of the mgf:
dMX (t)
= 2
dt ( t)
and evaluating it at t = 0:
dMX (t) 1
E [X] = = 2 =
dt t=0 ( 0)
The second moment of X can be computed by taking the second derivative of the
mgf:
d2 MX (t) 2
2
= 3
dt ( t)
and evaluating it at t = 0:
d2 MX (t) 2 2
E X2 = = 3 = 2
dt2 t=0 ( 0)
Proof. For a fully general proof of this proposition see, e.g., Feller6 (2008). We
just give an informal proof for the special case in which X and Y are discrete
random variables taking only …nitely many values. The "only if" part is trivial. If
X and Y have the same distribution, then
The "if" part is proved as follows. Denote by RX and RY the supports of X and
Y , and by pX (x) and pY (y) their probability mass functions7 . Denote by A the
union of the two supports:
A = R X [ RY
and by a1 ; : : : ; an the elements of A. The mgf of X can be written as
MX (t)
= E [exp (tX)]
X
A = exp (tx) pX (x)
x2RX
Xn
B = exp (tai ) pX (ai )
i=1
If X and Y have the same mgf, then, for any t belonging to a closed neighborhood
of zero,
MX (t) = MY (t)
and
n
X n
X
exp (tai ) pX (ai ) = exp (tai ) pY (ai )
i=1 i=1
This can be true for any t belonging to a closed neighborhood of zero only if
pX (ai ) pY (ai ) = 0
for every i. It follows that that the probability mass functions of X and Y are
equal. As a consequence, also their distribution functions are equal.
It must be stressed that this proposition is extremely important and relevant
from a practical viewpoint: in many cases where we need to prove that two dis-
tributions are equal, it is much easier to prove equality of the mgfs than to prove
equality of the distribution functions.
6 Feller, W. (2008) An introduction to probability theory and its applications, Volume 2, Wiley.
7 See p. 106.
37.4. MORE DETAILS 293
Also note that equality of the distribution functions can be replaced in the
proposition above by equality of the probability mass functions8 if X and Y are
discrete random variables, or by equality of the probability density functions9 if X
and Y are absolutely continuous random variables.
Y = a + bX
where a; b 2 R are two constants and b 6= 0. Then, the random variable Y possesses
an mgf MY (t) and
MY (t) = exp (at) MX (bt)
Exercise 1
Let X be a discrete random variable having a Bernoulli distribution12 . Its support
is
RX = f0; 1g
and its probability mass function13 is
8
< p if x = 1
pX (x) = 1 p if x = 0
:
0 if x 2
= RX
Solution
Using the de…nition of mgf, we get
X
MX (t) = E [exp (tX)] = exp (tx) pX (x)
x2RX
= exp (t 1) pX (1) + exp (t 0) pX (0)
= exp (t) p + 1 (1 p) = 1 p + p exp (t)
The mgf exists and it is well-de…ned because the above expected value exists for
any t 2 R.
1 1 See p. 234.
1 2 See p. 335.
1 3 See p. 106.
37.5. SOLVED EXERCISES 295
Exercise 2
Let X be a random variable with mgf
1
MX (t) = (1 + exp (t))
2
Derive the variance of X.
Solution
We can use the following formula for computing the variance14 :
2
Var [X] = E X 2 E [X]
The expected value of X is computed by taking the …rst derivative of the mgf:
dMX (t) 1
= exp (t)
dt 2
and evaluating it at t = 0:
dMX (t) 1 1
E [X] = = exp (0) =
dt t=0 2 2
The second moment of X is computed by taking the second derivative of the mgf:
d2 MX (t) 1
= exp (t)
dt2 2
and evaluating it at t = 0:
d2 MX (t) 1 1
E X2 = = exp (0) =
dt2 t=0 2 2
Therefore,
2
2 1 1
Var [X] = E X2 E [X] =
2 2
1 1 1
= =
2 4 4
Exercise 3
A random variable X is said to have a Chi-square distribution15 with n degrees of
freedom if its mgf is de…ned for any t < 12 and it is equal to
n=2
MX (t) = (1 2t)
De…ne
Y = X1 + X2
where X1 and X2 are two independent random variables having Chi-square dis-
tributions with n1 and n2 degrees of freedom respectively. Prove that Y has a
Chi-square distribution with n1 + n2 degrees of freedom.
1 4 See p. 156.
1 5 See p. 387.
296 CHAPTER 37. MGF OF A RANDOM VARIABLE
Solution
The mgfs of X1 and X2 are
n1 =2
MX1 (t) = (1 2t)
n2 =2
MX2 (t) = (1 2t)
The mgf of a sum of independent random variables is the product of the mgfs of
the summands:
n1 =2 n2 =2 (n1 +n2 )=2
MY (t) = (1 2t) (1 2t) = (1 2t)
38.1 De…nition
Let us start with a formal de…nition.
exists and is …nite for all K 1 real vectors t belonging to a closed rectangle H
such that
H = [ h1 ; h 1 ] [ h2 ; h 2 ] ::: [ hK ; h K ] RK
with hi > 0 for all i = 1; : : : ; K, then we say that X possesses a joint moment
generating function and the function MX : H ! R de…ned by
297
298 CHAPTER 38. JOINT MGF OF A RANDOM VECTOR
K=2 1 |
fX (x) = (2 ) exp x x
2
As explained in the lecture entitled Multivariate normal distribution (p. 439), the K
components of X are K mutually independent4 standard normal random variables,
because the joint probability density function of X can be written as
where xi is the i-th entry of x, and f (xi ) is the probability density function of a
standard normal random variable:
1=2 1 2
f (xi ) = (2 ) exp x
2 i
The joint mgf of X can be derived as follows:
where: in step A we have used the fact that the entries of X are mutually in-
dependent5 ; in step B we have used the de…nition of mgf of a random variable6 .
Since the mgf of a standard normal random variable is7
1 2
MXi (ti ) = exp t
2 i
the joint mgf of X is
K
Y K
Y 1 2
MX (t) = MXi (ti ) = exp t
i=1 i=1
2 i
K
!
1 X 1 >
= exp t2i = exp t t
2 i=1
2
Note that the mgf MXi (ti ) of a standard normal random variable is de…ned for
any ti 2 R. As a consequence, the joint mgf of X is de…ned for any t 2 RK .
3 See p. 117.
4 See p. 233.
5 See p. 234.
6 See p. 289.
7 See p. 378.
38.2. CROSS-MOMENTS AND JOINT MGFS 299
X
nK
(n1 ; n2 ; : : : ; nK ) = E [X1n1 X2n2 : : : XK ]
PK
where n1 ; n2 ; : : : ; nK 2 Z+ and n = k=1 nk , then
where the derivative on the right-hand side is an n-th order cross-partial derivative
of MX (t) evaluated at the point t1 = 0, t2 = 0, . . . , tK = 0.
Proof. We do not provide a rigorous proof of this proposition, but see, e.g.,
Pfei¤er8 (1978) and DasGupta9 (2010). The intuition of the proof, however, is
straightforward: since the expected value is a linear operator and di¤erentiation is
a linear operation, under appropriate conditions one can di¤erentiate through the
expected value, as follows:
The following example shows how the above proposition can be applied.
Example 218 Let us continue with the previous example. The joint mgf of a 2 1
standard normal random vector X is
1 > 1 2 1 2
MX (t) = exp t t = exp t + t
2 2 1 2 2
8 Pfei¤er, P. E. (1978) Concepts of probability theory, Courier Dover Publications.
9 DasGupta, A. (2010) Fundamentals of probability: a …rst course, Springer.
300 CHAPTER 38. JOINT MGF OF A RANDOM VECTOR
X (1; 1) = E [X1 X2 ]
@2 1 2 1 2
= exp t + t
@t1 @t2 2 1 2 2 t1 =0;t2 =0
@ @ 1 2 1 2
= exp t + t
@t1 @t2 2 1 2 2 t1 =0;t2 =0
@ 1 2 1 2
= t2 exp t + t
@t1 2 1 2 2 t1 =0;t2 =0
1 2 1 2
= t1 t2 exp t + t =0
2 1 2 2 t1 =0;t2 =0
Proof. The reader may refer to Feller11 (2008) for a rigorous proof. The informal
proof given here is almost identical to that given for the univariate case12 . We con-
…ne our attention to the case in which X and Y are discrete random vectors taking
only …nitely many values. As far as the left-to-right direction of the implication is
concerned, it su¢ ces to note that if X and Y have the same distribution, then
A = R X [ RY
MX (t) = MY (t)
for any t belonging to a closed rectangle where the two mgfs are well-de…ned, and
n
X n
X
exp t> ai pX (ai ) = exp t> ai pY (ai )
i=1 i=1
pX (ai ) pY (ai ) = 0
for every i. As a consequence, the joint probability mass functions of X and Y are
equal, which implies that also their joint distribution functions are equal.
This proposition is used very often in applications where one needs to demon-
strate that two joint distributions are equal. In such applications, proving equality
of the joint mgfs is often much easier than proving equality of the joint distribution
functions (see also the comments to Proposition 212).
where: in step A we have used the fact that the entries of X are mutually
independent; in step B we have used the de…nition of mgf of a random variable.
where: in step A we have used the fact that the vectors Xi are mutually inde-
pendent; in step B we have used the de…nition of joint mgf.
Exercise 1
Let X be a 2 1 discrete random vector and denote its components by X1 and
X2 . Let the support of X be
n o
> > >
RX = [1 1] ; [2 0] ; [0 0]
Solution
By using the de…nition of joint mgf, we get
Exercise 2
Let
>
X = [X1 X2 ]
be a 2 1 random vector with joint mgf
1 2
MX1 ;X2 (t1 ; t2 ) = + exp (t1 + 2t2 )
3 3
Derive the expected value of X1 .
Solution
The mgf of X1 is
dMX1 (t1 ) 2
= exp (t1 )
dt1 3
and evaluating it at t1 = 0:
dMX1 (t1 ) 2 2
E [X1 ] = = exp (0) =
dt1 t1 =0 3 3
Exercise 3
Let
>
X = [X1 X2 ]
38.5. SOLVED EXERCISES 305
Solution
We can use the following covariance formula:
The mgf of X1 is
The mgf of X2 is
1
= [2 exp (t1 + 2t2 ) + 2 exp (2t1 + t2 )]
3
and evaluating it at (t1 ; t2 ) = (0; 0):
Characteristic function of a
random variable
In the lecture entitled Moment generating function (p. 289), we have explained
that the distribution of a random variable can be characterized in terms of its
moment generating function, a real function that enjoys two important properties:
it uniquely determines its associated probability distribution, and its derivatives
at zero are equal to the moments of the random variable. We have also explained
that not all random variables possess a moment generating function.
The characteristic function (cf) enjoys properties that are almost identical to
those enjoyed by the moment generating function, but it has an important advan-
tage: all random variables possess a characteristic function.
39.1 De…nition
We start this lecture by giving a de…nition of characteristic function.
p
De…nition 223 Let X be a random variable. Let i = 1 be the imaginary unit.
The function ' : R ! C de…ned by
The …rst thing to be noted is that the characteristic function 'X (t) exists for
any t. This can be proved as follows:
and the last two expected values are well-de…ned, because the sine and cosine
functions are bounded in the interval [ 1; 1].
307
308 CHAPTER 39. CF OF A RANDOM VARIABLE
Proposition 224 Let X be a random variable and 'X (t) its characteristic func-
tion. Let n 2 N. If the n-th moment of X, denoted by X (n), exists and is …nite,
then 'X (t) is n times continuously di¤ erentiable and
1 dn 'X (t)
X (n) = E [X n ] =
in dtn t=0
n
where d dt'X (t)
n is the n-th derivative of 'X (t) with respect to t, evaluated at
t=0
the point t = 0.
Proof. The proof of the above proposition is quite complex (see, e.g., Resnick1 -
1999). The intuition, however, is straightforward: since the expected value is a lin-
ear operator and di¤erentiation is a linear operation, under appropriate conditions
one can di¤erentiate through the expected value, as follows:
dn 'X (t) dn
= E [exp (itX)]
dtn dtn
dn
= E exp (itX)
dtn
n
= E [(iX) exp (itX)]
= in E [X n exp (itX)]
dn 'X (t)
= in E [X n exp (0 iX)] = in E [X n ] = in X (n)
dtn t=0
In practice, the proposition above is not very useful when one wants to compute
a moment of a random variable, because it requires to know in advance whether
the moment exists or not. A much more useful statement is provided by the next
proposition.
Proposition 225 Let X be a random variable and 'X (t) its characteristic func-
tion. If 'X (t) is n times di¤ erentiable at the point t = 0, then
2. if n is odd, the k-th moment of X exists and is …nite for any k < n.
1 dk 'X (t)
X (k) = E X k =
ik dtk t=0
exp ( x) if x 2 RX
fX (x) =
0 if x 2
= RX
d2 'X (t) 2
= 3
dt2 ( it)
Evaluating it at t = 0, we obtain
d2 'X (t) 2
= 2
dt2 t=0
Proof. For a formal proof, see, e.g., Resnick4 (1999). An informal proof for the
special case in which X and Y have a …nite support can be provided along the
same lines of the proof of Proposition 212, which concerns the moment generating
function. This is left as an exercise (just replace exp (tX) and exp (tY ) in that
proof with exp (itX) and exp (itY )).
This property is analogous to the property of joint moment generating functions
stated in Proposition 212. The same comments we made about that proposition
also apply to this one.
Proposition 228 Let X be a random variable with characteristic function 'X (t).
De…ne
Y = a + bX
where a; b 2 R are two constants and b 6= 0. Then, the characteristic function of
Y is
'Y (t) = exp (iat) 'X (bt)
39.4.2 Cf of a sum
The next proposition shows how to derive the characteristic function of a sum of
independent random variables.
Exercise 1
Let X be a discrete random variable having support
RX = f0; 1; 2g
Solution
By using the de…nition of characteristic function, we obtain
X
'X (t) = E [exp (itX)] = exp (itx) pX (x)
x2RX
= exp (it 0) pX (0) + exp (it 1) pX (1) + exp (it 2) pX (2)
1 1 1 1
= + exp (it) + exp (2it) = [1 + exp (it) + exp (2it)]
3 3 3 3
Exercise 2
Use the characteristic function found in the previous exercise to derive the variance
of X.
Solution
We can use the following formula for computing the variance:
2
Var [X] = E X 2 E [X]
The expected value of X is computed by taking the …rst derivative of the charac-
teristic function:
d'X (t) 1
= [i exp (it) + 2i exp (2it)]
dt 3
evaluating it at t = 0, and dividing it by i:
1 d'X (t) 11
E [X] = = [i exp (i 0) + 2i exp (2i 0)] = 1
i dt t=0 i3
1 d2 'X (t) 11 2 5
E X2 = = i exp (i 0) + 4i2 exp (2i 0) =
i2 dt2 t=0 i2 3 3
Therefore,
2 5 2
Var [X] = E X 2 E [X] = 12 =
3 3
Exercise 3
Read and try to understand how the characteristic functions of the uniform and
exponential distributions are derived in the lectures entitled Uniform distribution
(p. 359) and Exponential distribution (p. 365).
314 CHAPTER 39. CF OF A RANDOM VARIABLE
Chapter 40
Characteristic function of a
random vector
This lecture introduces the notion of joint characteristic function (joint cf) of a
random vector, which is a multivariate generalization of the concept of character-
istic function of a random variable. Before reading this lecture, you are advised to
…rst read the lecture entitled Characteristic function (p. 307).
40.1 De…nition
Let us start this lecture with a de…nition.
p
De…nition 230 Let X be a K 1 random vector. Let i = 1 be the imaginary
unit. The function ' : RK ! C de…ned by
2 0 13
K
X
'X (t) = E exp it> X = E 4exp @i tj Xj A5
j=1
315
316 CHAPTER 40. JOINT CF
following proposition.
Proposition 231 Let X be a random vector and 'X (t) its joint characteristic
function. Let n 2 N. De…ne a cross-moment of order n as follows:
X
nK
(n1 ; n2 ; : : : ; nK ) = E [X1n1 X2n2 : : : XK ]
where n1 ; n2 ; : : : ; nK 2 Z+ and
K
X
n= nk
k=1
If all cross-moments of order n exist and are …nite, then all the n-th order partial
derivatives of 'X (t) exist and
where the partial derivative on the right-hand side of the equation is evaluated at
the point t1 = 0, t2 = 0, . . . , tK = 0.
Proposition 232 Let X be a random vector and 'X (t) its joint characteristic
function. If all the n-th order partial derivatives of 'X (t) exist, then:
In both cases,
Proof. See Ushakov (1999). An informal proof for the special case in which X
and Y have a …nite support can be provided along the same lines of the proof of
Proposition 219, which concerns the joint moment generating function. This is left
as an exercise (just replace exp t> X and exp t> Y in that proof with exp it> X
and exp it> Y ).
This property is analogous to the property of joint moment generating functions
stated in Proposition 219. The same comments we made about that proposition
also apply to this one.
4 See p. 118.
318 CHAPTER 40. JOINT CF
K
Y
'X (t1 ; : : : ; tK ) = 'Xj (tj )
j=1
where: in step A we have used the fact that the entries of X are mutually
independent5 ; in step B we have used the de…nition of characteristic function
of a random variable6 .
Then, the joint characteristic function of Z is the product of the joint characteristic
functions of X1 ; : : : ; Xn :
Yn
'Z (t) = 'Xj (t)
j=1
5 In particular, see the mutual independence via expectations property (p. 234).
6 See p. 307.
40.5. SOLVED EXERCISES 319
where: in step A we have used the fact that the vectors Xj are mutually inde-
pendent; in step B we have used the de…nition of joint characteristic function of
a random vector given above.
Exercise 1
Let Z1 and Z2 be two independent standard normal random variables7 . Let X be
a 2 1 random vector whose components are de…ned as follows:
X1 = Z12
X2 = Z12 + Z22
Solution
By using the de…nition of characteristic function, we get
where: in step A we have used the fact that Z1 and Z2 are independent; in step
B we have used the de…nition of characteristic function.
Exercise 2
Use the joint characteristic function found in the previous exercise to derive the
expected value and the covariance matrix of X.
Solution
We need to compute the partial derivatives of the joint characteristic function:
@' 1 3=2
= 1 2it1 4it2 4t1 t2 4t22 ( 2i 4t2 )
@t1 2
@' 1 3=2
= 1 2it1 4it2 4t1 t2 4t22 ( 4i 4t1 8t2 )
@t2 2
@2' 3 5=2 2
= 1 2it1 4it2 4t1 t2 4t22 ( 2i 4t2 )
@t21 4
@2' 3 5=2 2
= 1 2it1 4it2 4t1 t2 4t22 ( 4i 4t1 8t2 )
@t22 4
3=2
+4 1 2it1 4it2 4t1 t2 4t22
@2' 3 5=2
= 1 2it1 4it2 4t1 t2 4t22 ( 2i 4t2 ) ( 4i 4t1 8t2 )
@t1 @t2 4
3=2
+2 1 2it1 4it2 4t1 t2 4t22
All partial derivatives up to the second order exist and are well de…ned. As a
consequence, all cross-moments up to the second order exist and are …nite and
they can be computed from the above partial derivatives:
1 @' 1
E [X1 ] = = i=1
i @t1 t1 =0;t2 =0 i
1 @' 1
E [X2 ] = = 2i = 2
i @t2 t1 =0;t2 =0 i
40.5. SOLVED EXERCISES 321
1 @2' 1
E X12 = = 3i2 = 3
i2 @t21 t1 =0;t2 =0 i2
2
1 @ ' 1
E X22 = = 12i2 + 4 = 8
i2 @t22 t1 =0;t2 =0 i2
1 @2' 1
E [X1 X2 ] = = 6i2 + 2 = 4
i2 @t1 @t2 t1 =0;t2 =0 i2
Exercise 3
Read and try to understand how the joint characteristic function of the multinomial
distribution is derived in the lecture entitled Multinomial distribution (p. 431).