0% found this document useful (0 votes)
33 views19 pages

T M N D: HE Ultivariate Ormal Istribution

my study notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views19 pages

T M N D: HE Ultivariate Ormal Istribution

my study notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Chapter

4
THE MULTIVARIATE NORMAL
DISTRIBUTION
4.1 Introduction
A generalization of the familiar bell-shaped normal density to several dimensions plays
a fundamental role in multivariate analysis. In fact, most of the techniques encountered
in this book are based on the assumption that the data were generated from a multi-
variate normal distribution. While real data are never exactly multivariate normal, the
normal density is often a useful approximation to the “true” population distribution.
One advantage of the multivariate normal distribution stems from the fact that
it is mathematically tractable and “nice” results can be obtained. This is frequently
not the case for other data-generating distributions. Of course, mathematical attrac-
tiveness per se is of little use to the practitioner. It turns out, however, that normal
distributions are useful in practice for two reasons: First, the normal distribution
serves as a bona fide population model in some instances; second, the sampling
distributions of many multivariate statistics are approximately normal, regardless of
the form of the parent population, because of a central limit effect.
To summarize, many real-world problems fall naturally within the framework of
normal theory. The importance of the normal distribution rests on its dual role as
both population model for certain natural phenomena and approximate sampling
distribution for many statistics.

4.2 The Multivariate Normal Density and Its Properties


The multivariate normal density is a generalization of the univariate normal density
to p Ú 2 dimensions. Recall that the univariate normal distribution, with mean m
and variance s2, has the probability density function
1 2
f1x2 = e -71x - m2>s8 >2 -q 6 x 6 q (4-1)
2
22ps

From Chapter 4 of Applied Multivariate Statistical Analysis, Sixth Edition. Richard A. Johnson,
Dean W. Wichern. Copyright © 2007 by Pearson Education, Inc. All rights reserved.
149
150 Chapter 4 The Multivariate Normal Distribution

.683 Figure 4.1 A normal density


with mean m and variance s2
.954 and selected areas under the
x
µ  2σ µ σ µ µ σ µ  2σ curve.

A plot of this function yields the familiar bell-shaped curve shown in Figure 4.1.
Also shown in the figure are approximate areas under the curve within ; 1 standard
deviations and ;2 standard deviations of the mean. These areas represent probabil-
ities, and thus, for the normal random variable X,

P1m - s … X … m + s2  .68
P1m - 2s … X … m + 2s2  .95

It is convenient to denote the normal density function with mean m and vari-
ance s2 by N1m, s22. Therefore, N110, 42 refers to the function in (4-1) with m = 10
and s = 2. This notation will be extended to the multivariate case later.
The term
x - m 2 -1
a b = 1x - m2 1s22 1x - m2 (4-2)
s
in the exponent of the univariate normal density function measures the square of
the distance from x to m in standard deviation units. This can be generalized for a
p * 1 vector x of observations on several variables as

1x - M2¿ -11x - M2 (4-3)

The p * 1 vector M represents the expected value of the random vector X, and the
p * p matrix  is the variance–covariance matrix of X. [See (2–30) and (2–31).] We
shall assume that the symmetric matrix  is positive definite, so the expression in
(4-3) is the square of the generalized distance from x to M.
The multivariate normal density is obtained by replacing the univariate distance
in (4-2) by the multivariate generalized distance of (4-3) in the density function of
(4-1). When this replacement is made, the univariate normalizing constant
-1>2
12p2-1>21s22 must be changed to a more general constant that makes the volume
under the surface of the multivariate density function unity for any p. This is neces-
sary because, in the multivariate case, probabilities are represented by volumes
under the surface over regions defined by intervals of the xi values. It can be shown
(see [1]) that this constant is 12p2-p>2 ƒ  ƒ -1>2, and consequently, a p-dimensional
normal density for the random vector X¿ = 7X1 , X2, Á , Xp8 has the form

1 ¿ -1
f1x2 = e -1x -M2  1x -M2>2 (4-4)
12p2p>2 ƒ  ƒ 1>2

where - q 6 xi 6 q , i = 1, 2, Á , p. We shall denote this p-dimensional normal


density by Np1M, 2, which is analogous to the normal density in the univariate
case.

150
The Multivariate Normal Density and Its Properties 151

Example 4.1 (Bivariate normal density) Let us evaluate the p = 2-variate normal
density in terms of the individual parameters m1 = E1X12, m2 = E1X22,
s1 1 = Var 1X12, s2 2 = Var 1X22, and r1 2 = s1 2>1 1s1 1 1s2 22 = Corr 1X1 , X22.
Using Result 2A.8, we find that the inverse of the covariance matrix

s1 1 s1 2
 = B R
s1 2 s2 2
is
1 s2 2 - s1 2
-1 = B R
s1 1 s2 2 - s212 - s1 2 s1 1
Introducing the correlation coefficient r1 2 by writing s1 2 = r1 2 1s1 1 1s2 2 , we
obtain s1 1s2 2 - s12 2 = s1 1 s2 211 - r21 22, and the squared distance becomes
1x - M2¿ -11x - M2
1
= 7x1 - m1 , x2 - m28
s1 1s2 211 - r12 22

s2 2 - r1 2 1s1 1 1s2 2 x - m1
B R B 1 R
- r1 2 1s1 1 1s2 2 s1 1 x2 - m2

s2 21x1 - m122 + s1 11x2 - m222 - 2r1 2 1s1 1 1s2 2 1x1 - m12 1x2 - m22
=
s1 1s2 211 - r12 22

1 x1 - m1 2 x2 - m2 2 x1 - m1 x2 - m2
= 2
B ¢ ≤ + ¢ ≤ - 2r1 2 ¢ ≤ ¢ ≤R (4-5)
1 - r1 2 1s 11 1s 22 1s 11 1s2 2

The last expression is written in terms of the standardized values 1x1 - m12> 1s1 1 and
1x2 - m22> 1s2 2 .
Next, since ƒ  ƒ = s1 1s2 2 - s21 2 = s1 1s2 211 - r12 22, we can substitute for -1
and ƒ  ƒ in (4-4) to get the expression for the bivariate 1p = 22 normal density
involving the individual parameters m1 , m2 , s1 1 , s2 2 , and r1 2 :
1
f1x1 , x22 = (4-6)
2p2s1 1s2 211 - r12 22

1 x1 - m1 2 x2 - m2 2
* exp b - B ¢ ≤ + ¢ ≤
211 - r12 22 1s1 1 1s2 2

x1 - m1 x2 - m2
- 2r1 2 ¢ ≤ ¢ ≤Rr
1s1 1 1s2 2
The expression in (4-6) is somewhat unwieldy, and the compact general form in
(4-4) is more informative in many ways. On the other hand, the expression in (4-6) is
useful for discussing certain properties of the normal distribution. For example, if the
random variables X1 and X2 are uncorrelated, so that r1 2 = 0, the joint density can
be written as the product of two univariate normal densities each of the form of (4-1).

151
152 Chapter 4 The Multivariate Normal Distribution

That is, f1x1 , x22 = f1x12f1x22 and X1 and X2 are independent. [See (2-28).] This
result is true in general. (See Result 4.5.)
Two bivariate distributions with s1 1 = s2 2 are shown in Figure 4.2. In Figure
4.2(a), X1 and X2 are independent 1r1 2 = 02. In Figure 4.2(b), r1 2 = .75. Notice how
the presence of correlation causes the probability to concentrate along a line. ■

f (x1, x 2)
x2

x1
(a)

f (x1, x 2 )

x2

x1
(b)

Figure 4.2 Two bivariate normal distributions. (a) s1 1 = s2 2 and r1 2 = 0.


(b) s1 1 = s2 2 and r1 2 = .75.

152
The Multivariate Normal Density and Its Properties 153

From the expression in (4-4) for the density of a p-dimensional normal variable, it
should be clear that the paths of x values yielding a constant height for the density are
ellipsoids. That is, the multivariate normal density is constant on surfaces where the
square of the distance 1x - M2¿ -11x - M2 is constant.These paths are called contours:

Constant probability density contour = 5all x such that 1x - M2¿ -11x - M2 = c26
= surface of an ellipsoid centered at M
The axes of each ellipsoid of constant density are in the direction of the eigen-
vectors of -1, and their lengths are proportional to the reciprocals of the square
roots of the eigenvalues of -1. Fortunately, we can avoid the calculation of -1 when
determining the axes, since these ellipsoids are also determined by the eigenvalues
and eigenvectors of . We state the correspondence formally for later reference.

Result 4.1. If  is positive definite, so that -1 exists, then


1
e = le implies -1 e = a b e
l
so 1l, e2 is an eigenvalue–eigenvector pair for  corresponding to the pair 11>l, e2
for -1. Also, -1 is positive definite.

Proof. For  positive definite and e Z 0 an eigenvector, we have 0 6 e¿ e = e¿1e2


= e¿1le2 = le¿ e = l. Moreover, e = -11e2 = -11le2, or e = l-1 e, and divi-
sion by l 7 0 gives -1 e = 11>l2e. Thus, 11>l, e2 is an eigenvalue–eigenvector pair
for -1. Also, for any p * 1 x, by (2-21)

x¿ -1 x = x¿ a a a b eieiœ b x
p
1
i=1 li

= a a b 1x¿ ei2 Ú 0
p
1 2

i=1 li
2
since each term li-11x¿ ei2 is nonnegative. In addition, x¿ ei = 0 for all i only if
x = 0. So x Z 0 implies that a 11>li2 1x¿ ei2 7 0, and it follows that -1 is
p
2

i=1
positive definite. 

The following summarizes these concepts:

Contours of constant density for the p-dimensional normal distribution are


ellipsoids defined by x such the that

1x - M2¿ -11x - M2 = c2 (4-7)

These ellipsoids are centered at M and have axes ; c 1li ei , where ei = liei
for i = 1, 2, Á , p.

A contour of constant density for a bivariate normal distribution with


s1 1 = s2 2 is obtained in the following example.

153
154 Chapter 4 The Multivariate Normal Distribution

Example 4.2 (Contours of the bivariate normal density) We shall obtain the axes of
constant probability density contours for a bivariate normal distribution when
s1 1 = s2 2 . From (4-7), these axes are given by the eigenvalues and eigenvectors of
. Here ƒ  - lI ƒ = 0 becomes

s1 1 - l s1 2
0 = ` ` = 1s1 1 - l22 - s21 2
s1 2 s1 1 - l

= 1l - s1 1 - s1 22 1l - s1 1 + s1 22

Consequently, the eigenvalues are l1 = s1 1 + s1 2 and l2 = s1 1 - s1 2 . The eigen-


vector e1 is determined from

s1 1 s1 2 e e
B R B 1 R = 1s1 1 + s1 22 B 1 R
s1 2 s1 1 e2 e2
or
s1 1e1 + s1 2 e2 = 1s1 1 + s1 22e1
s1 2e1 + s1 1 e2 = 1s1 1 + s1 22e2

These equations imply that e1 = e2 , and after normalization, the first eigenvalue–
eigenvector pair is
1
12
l1 = s1 1 + s1 2 , e1 = D T
1
12

Similarly, l2 = s1 1 - s1 2 yields the eigenvector e2œ = 71> 12, - 1> 128.


When the covariance s1 2 (or correlation r1 2) is positive, l1 = s1 1 + s1 2 is the
largest eigenvalue, and its associated eigenvector e1œ = 71> 12, 1> 128 lies along
the 45° line through the point M ¿ = 7m1 , m28. This is true for any positive value of
the covariance (correlation). Since the axes of the constant-density ellipses are
given by ;c1l1 e1 and ; c 1l2 e2 [see (4–7)], and the eigenvectors each have
length unity, the major axis will be associated with the largest eigenvalue. For
positively correlated normal random variables, then, the major axis of the
constant-density ellipses will be along the 45° line through M. (See Figure 4.3.)

x2

c  σ 11 +σ 12

c σ 11 σ 12
µ2 Figure 4.3 A constant-density
contour for a bivariate normal
distribution with s1 1 = s2 2 and
x1
µ1 s1 2 7 0 (or r1 2 7 0 ).

154
The Multivariate Normal Density and Its Properties 155

When the covariance (correlation) is negative, l2 = s1 1 - s1 2 will be the largest


eigenvalue, and the major axes of the constant-density ellipses will lie along a line
at right angles to the 45° line through M. (These results are true only for
s1 1 = s2 2 .)
To summarize, the axes of the ellipses of constant density for a bivariate normal
distribution with s1 1 = s2 2 are determined by

1 1
12 12
;c1s1 1 + s1 2 D T and ; c 1s1 1 - s1 2 D T
1 -1
12 12 ■

We show in Result 4.7 that the choice c2 = x2p1a2, where x2p1a2 is the upper
1100a2th percentile of a chi-square distribution with p degrees of freedom, leads to
contours that contain 11 - a2 * 100% of the probability. Specifically, the following
is true for a p-dimensional normal distribution:

The solid ellipsoid of x values satisfying

1x - M2¿ -11x - M2 … x2p1a2 (4-8)

has probability 1 - a.

The constant-density contours containing 50% and 90% of the probability under
the bivariate normal surfaces in Figure 4.2 are pictured in Figure 4.4.

x2 x2

µ2
µ2

x1 x1
µ1 µ1

Figure 4.4 The 50% and 90% contours for the bivariate normal
distributions in Figure 4.2.

The p-variate normal density in (4-4) has a maximum value when the squared
distance in (4-3) is zero—that is, when x = M. Thus, M is the point of maximum
density, or mode, as well as the expected value of X, or mean. The fact that M is
the mean of the multivariate normal distribution follows from the symmetry
exhibited by the constant-density contours: These contours are centered, or balanced,
at M.

155
156 Chapter 4 The Multivariate Normal Distribution

Additional Properties of the Multivariate


Normal Distribution
Certain properties of the normal distribution will be needed repeatedly in our
explanations of statistical models and methods. These properties make it possible
to manipulate normal distributions easily and, as we suggested in Section 4.1, are
partly responsible for the popularity of the normal distribution. The key proper-
ties, which we shall soon discuss in some mathematical detail, can be stated rather
simply.
The following are true for a random vector X having a multivariate normal
distribution:

1. Linear combinations of the components of X are normally distributed.


2. All subsets of the components of X have a (multivariate) normal distribution.
3. Zero covariance implies that the corresponding components are independently
distributed.
4. The conditional distributions of the components are (multivariate) normal.
These statements are reproduced mathematically in the results that follow. Many
of these results are illustrated with examples. The proofs that are included should
help improve your understanding of matrix manipulations and also lead you
to an appreciation for the manner in which the results successively build on
themselves.
Result 4.2 can be taken as a working definition of the normal distribution. With
this in hand, the subsequent properties are almost immediate. Our partial proof of
Result 4.2 indicates how the linear combination definition of a normal density
relates to the multivariate density in (4-4).

Result 4.2. If X is distributed as Np1M, 2, then any linear combination of vari-
ables a¿ X = a1X1 + a2X2 ± Á + apXp is distributed as N1a¿ M, a¿ a2. Also, if a¿ X
is distributed as N1a¿M, a¿ a2 for every a, then X must be Np1M, 2.

Proof. The expected value and variance of a¿ X follow from (2-43). Proving that
a¿ X is normally distributed if X is multivariate normal is more difficult. You can find
a proof in [1]. The second part of result 4.2 is also demonstrated in [1]. 

Example 4.3 (The distribution of a linear combination of the components of a normal


random vector) Consider the linear combination a¿ X of a multivariate normal ran-
dom vector determined by the choice a¿ = 71, 0, Á , 08. Since

X1
X2
a¿ X = 71, 0, Á , 08 D T = X1
o
Xp

156
The Multivariate Normal Density and Its Properties 157

and
m1
m2
a¿ M = 71, 0, Á , 08 D T = m1
o
mp

we have
s1 1 s1 2 Á s1 p 1
s1 2 s2 2 Á s2 p 0
a¿ a = 71, 0, Á , 08 D T D T = s1 1
o o ∞ o o
s1 p s2 p Á sp p 0

and it follows from Result 4.2 that X1 is distributed as N1m1 , s1 12. More generally,
the marginal distribution of any component Xi of X is N1mi , si i2. ■

The next result considers several linear combinations of a multivariate normal


vector X.

Result 4.3. If X is distributed as Np1M, 2, the q linear combinations

a1 1X1 + Á + a1 pXp
a2 1X1 + Á + a2 pXp
A X = D T
1q * p21p * 12 o
aq 1X1 + Á + aq pXp

are distributed as Nq1AM, AA¿2. Also, X + d , where d is a vector of


1p * 12 1p * 12
constants, is distributed as Np1M + d, 2.

Proof. The expected value E1AX2 and the covariance matrix of AX follow from
(2–45). Any linear combination b¿ 1AX2 is a linear combination of X, of the
form a¿X with a = A¿b. Thus, the conclusion concerning AX follows directly from
Result 4.2.
The second part of the result can be obtained by considering a¿1X + d2 =
a¿ X + 1a¿ d2, where a¿ X is distributed as N1a¿ M, a¿ a2. It is known from the
univariate case that adding a constant a¿d to the random variable a¿ X leaves the
variance unchanged and translates the mean to a¿ M + a¿ d = a¿1M + d2. Since a
was arbitrary, X + d is distributed as Np1M + d, 2. 

Example 4.4 (The distribution of two linear combinations of the components of a


normal random vector) For X distributed as N31M, 2, find the distribution of

X1
X1 - X2 1 -1 0
B R = B R C X2 S = AX
X2 - X3 0 1 -1
X3

157
158 Chapter 4 The Multivariate Normal Distribution

By Result 4.3, the distribution of AX is multivariate normal with mean

m1
1 -1 0 m - m2
AM = B R C m2 S = B 1 R
0 1 -1 m2 - m3
m3

and covariance matrix

s1 1 s1 2 s1 3 1 0
1 -1 0
AA¿ = B R C s1 2 s2 2 s2 3 S C - 1 1S
0 1 -1
s1 3 s2 3 s3 3 0 -1

1 0
s - s1 2 s1 2 - s2 2 s1 3 - s2 3
= B 11 R C -1 1S
s1 2 - s1 3 s2 2 - s2 3 s2 3 - s3 3
0 -1

s1 1 - 2s1 2 + s2 2 s1 2 + s2 3 - s2 2 - s1 3
= B R
s1 2 + s2 3 - s2 2 - s1 3 s2 2 - 2s2 3 + s3 3

Alternatively, the mean vector AM and covariance matrix AA¿ may be veri-
fied by direct calculation of the means and covariances of the two random variables
Y1 = X1 - X2 and Y2 = X2 - X3 . ■

We have mentioned that all subsets of a multivariate normal random vector X


are themselves normally distributed. We state this property formally as Result 4.4.

Result 4.4. All subsets of X are normally distributed. If we respectively partition


X, its mean vector M, and its covariance matrix  as

X1 M1
1q * 12 1q * 12
X = C S M = C S
1p * 12 X2 1p * 12 M2
11p - q2 * 12 11p - q2 * 12

and
1 1 1 2
1q * q2 1q * 1p - q22
 = D T
1p * p2 2 1 2 2
11p - q2 * q2 11p - q2 * 1p - q22

then X 1 is distributed as Nq1M 1 , 1 12.

Proof. Set A = S I 0 T in Result 4.3, and the conclusion follows.


1q * p2 1q * q2 1q * 1p - q22
To apply Result 4.4 to an arbitrary subset of the components of X, we simply relabel
the subset of interest as X 1 and select the corresponding component means and
covariances as M 1 and 1 1 , respectively. 

158
The Multivariate Normal Density and Its Properties 159

Example 4.5 (The distribution of a subset of a normal random vector)


X2
If X is distributed as N51M, 2, find the distribution of B R . We set
X4

X2 m2 s2 2 s2 4
X1 = B R, M1 = B R, 1 1 = B R
X4 m4 s2 4 s4 4

and note that with this assignment, X, M, and  can respectively be rearranged and
partitioned as

X2 m2 s2 2 s2 4 s1 2 s2 3 s25
X4 m4 s2 4 s4 4 s1 4 s3 4 s4 5
X = E X1 U , M = E m1 U ,  = E s1 2 s1 4 s1 1 s1 3 s1 5 U
X3 m3 s2 3 s3 4 s1 3 s3 3 s3 5
X5 m5 s2 5 s4 5 s1 5 s3 5 s5 5
or
X1 M1 1 1 1 2
12 * 12 12 * 12 12 * 22 12 * 32
X = D T, M = D T,  = D T
X2 M2 2 1 2 2
13 * 12 13 * 12 13 * 22 13 * 32

Thus, from Result 4.4, for


X2
X1 = B R
X4
we have the distribution

m2 s2 2 s2 4
N21M 1 , 1 12 = N2 ¢ B R, B R≤
m4 s2 4 s4 4
It is clear from this example that the normal distribution for any subset can be
expressed by simply selecting the appropriate means and covariances from the origi-
nal M and . The formal process of relabeling and partitioning is unnecessary. ■

We are now in a position to state that zero correlation between normal random
variables or sets of normal random variables is equivalent to statistical independence.

Result 4.5.
(a) If X1 and X2 are independent, then Cov 1X 1 , X 22 = 0, a q1 * q2 matrix of
1q * 12 1q * 12
1 2
zeros.
X1 M  1 2
(b) If B R is Nq1 + q2 ¢ B 1 R , B 1 1 R ≤ , then X 1 and X 2 are independent if
X2 M2 2 1 2 2
and only if 1 2 = 0.

159
160 Chapter 4 The Multivariate Normal Distribution

(c) If X 1 and X 2 are independent and are distributed as Nq11M 1 , 1 12 and


X1
Nq21M 2 , 2 22, respectively, then B R has the multivariate normal distribution
X2

M1  0
Nq1 + q2 ¢ B R , B 11 R≤
M2 0¿ 2 2

Proof. (See Exercise 4.14 for partial proofs based upon factoring the density
function when 1 2 = 0.) 

Example 4.6 (The equivalence of zero covariance and independence for normal
variables) Let X be N31M, 2 with
13 * 12

4 1 0
 = C1 3 0S
0 0 2

Are X1 and X2 independent? What about 1X1 , X22 and X3 ?


Since X1 and X2 have covariance s1 2 = 1, they are not independent. However,
partitioning X and  as

X1 4 1 0 1 1 1 2
12 * 22 12 * 12
X = C X2 S ,  = C1 3 0S = C S
2 1 2 2
X3 0 0 2 11 * 22 11 * 12

X1 0
we see that X 1 = B R and X3 have covariance matrix 1 2 = B R . Therefore,
X2 0
1X1 , X22 and X3 are independent by Result 4.5. This implies X3 is independent of
X1 and also of X2 . ■

We pointed out in our discussion of the bivariate normal distribution that


r1 2 = 0 (zero correlation) implied independence because the joint density function
[see (4-6)] could then be written as the product of the marginal (normal) densities of
X1 and X2 . This fact, which we encouraged you to verify directly, is simply a special
case of Result 4.5 with q1 = q2 = 1.

X1 M
Result 4.6. Let X = B R be distributed as Np1M, 2 with M = B 1 R ,
X2 M2
1 1 1 2
 = B R , and ƒ 2 2 ƒ 7 0. Then the conditional distribution of X 1 , given
2 1 2 2
that X 2 = x 2 , is normal and has

Mean = M 1 + 1 22-12 1x 2 - M 22

160
The Multivariate Normal Density and Its Properties 161

and

Covariance = 1 1 - 1 22-12 2 1

Note that the covariance does not depend on the value x 2 of the conditioning
variable.

Proof. We shall give an indirect proof. (See Exercise 4.13, which uses the densities
directly.) Take

I - 1 2 2-12
1q * q2 q * 1p - q2
A = C S
1p * p2 0 I
1p - q2 * q 1p - q2 * 1p - q2

so

X1 - M1 X - M 1 - 1 22-121X 2 - M 22
A1X - M2 = A B R = B 1 R
X2 - M2 X2 - M2

is jointly normal with covariance matrix AA¿ given by

I - 1 22-12  1 2 I 0¿  - 1 22-12 2 1 0¿
B R B 11 R B œ R = B 11 R
0 I 2 1 2 2 1- 1 2 2-122 I 0 2 2

Since X 1 - M 1 - 1 2 2-12 1X 2 - M 22 and X 2 - M 2 have zero covariance, they are


independent. Moreover, the quantity X 1 - M 1 – 1 2 2-12 1X 2 - M 22 has distribution
Nq10, 1 1 - 1 2 2-12 2 12. Given that X 2 = x 2 , M 1 + 1 2 2-12 1x 2 - M 22 is a constant.
Because X 1 - M 1 – 1 2 2-12 1X 2 - M 22 and X 2 - M 2 are independent, the condi-
tional distribution of X 1 - M 1 – 1 2 2-12 1x 2 - M 22 is the same as the unconditional
distribution of X 1 - M 1 – 1 2 2-12 1X 2 - M 22. Since X 1 - M 1 – 1 22-12 1X 2 - M 22
is Nq10, 1 1 - 1 2 2-12 2 12, so is the random vector X 1 - M 1 – 1 22-12 1x 2 - M 22
when X 2 has the particular value x 2 . Equivalently, given that X 2 = x 2 , X 1 is distrib-
uted as Nq1M 1 + 1 22-12 1x 2 - M 22, 1 1 - 1 2 2-12 2 12. 

Example 4.7 (The conditional density of a bivariate normal distribution) The


conditional density of X1 , given that X2 = x2 for any bivariate distribution, is
defined by

f1x1 , x22
f1x1 ƒ x22 = 5conditional density of X1 given that X2 = x26 =
f1x22

where f1x22 is the marginal distribution of X2 . If f1x1 , x22 is the bivariate normal
density, show that f1x1 ƒ x22 is

s1 2 s12 2
N ¢ m1 + 1x2 - m22, s1 1 - ≤
s2 2 s2 2

161
162 Chapter 4 The Multivariate Normal Distribution

Here s1 1 - s12 2>s2 2 = s1 111 - r12 22. The two terms involving x1 - m1 in the expo-
nent of the bivariate normal density [see Equation (4-6)] become, apart from the
multiplicative constant -1>211 - r21 22,

1x1 - m122 1x1 - m121x2 - m22


- 2r1 2
s1 1 1s1 1 1s2 2

1 1s1 1 2 r21 2
= B x1 - m1 - r1 2 1x2 - m22 R - 1x - m222
s1 1 1s2 2 s2 2 2

Because r1 2 = s1 2> 1s1 1 1s2 2 , or r1 2 1s1 1> 1s2 2 = s1 2>s2 2 , the complete expo-
nent is

-1 1x1 - m122 1x1 - m12 1x2 - m22 1x2 - m222


2
¢ - 2r1 2 + ≤
211 - r1 22 s1 1 1s1 1 1s2 2 s2 2
2
-1 1s1 1
= ¢ x 1 - m1 - r12 1x 2 - m22 ≤
2s1 111 - r12 22 1s2 2

1 1 r21 2
- ¢ - ≤ 1x2 - m222
211 - r21 22 s2 2 s2 2
2 2
-1 s1 2 1 1x2 - m22
= 2
¢ x1 - m1 - 1x2 - m22 ≤ -
2s1 111 - r1 22 s2 2 2 s2 2

The constant term 2p2s1 1s2 211 - r12 22 also factors as

12p 1s2 2 * 12p 2s1 111 - r12 22

Dividing the joint density of X1 and X2 by the marginal density

1 2
f1x22 = e -1x2 - m22 >2s2 2
12p 1s2 2

and canceling terms yields the conditional density

f1x1 , x22
f1x1 ƒ x22 =
f1x22
1 2 2
= e -7x1 - m1 - 1s1 2>s2 221x2 - m228 >2s1 111 - r1 2 2,
12p 2s1 111 - r21 22
- q 6 x1 6 q

Thus, with our customary notation, the conditional distribution of X1 given that
-1
X2 = x2 is N1m1 + 1s1 2>s2 22 1x2 - m22, s1 111 - r21 222. Now, 1 1 - 1 222 2 1 =
2 2 -1
s1 1 - s1 2>s2 2 = s1 111 - r1 22 and 1 22 2 = s1 2>s2 2 , agreeing with Result 4.6,
which we obtained by an indirect method. ■

162
The Multivariate Normal Density and Its Properties 163

For the multivariate normal situation, it is worth emphasizing the following:


1. All conditional distributions are (multivariate) normal.
2. The conditional mean is of the form

m1 + b 1, q + 11xq + 1 - mq + 12 + Á + b 1, p1xp - mp2

o (4-9)
mq + b q, q + 11xq + 1 - mq + 12 + Á + b q, p1xp - mp2

where the b ’s are defined by

b 1, q + 1 b 1, q + 2 Á b 1, p
b b 2, q + 2 Á b 2, p
1 2 2-12 = D 2, q + 1 T
o o ∞ o
b q, q + 1 b q, q + 2 Á b q, p

3. The conditional covariance, 1 1 - 1 2 2-12 2 1 , does not depend upon the value(s)
of the conditioning variable(s).
We conclude this section by presenting two final properties of multivariate
normal random vectors. One has to do with the probability content of the ellipsoids
of constant density. The other discusses the distribution of another form of linear
combinations.
The chi-square distribution determines the variability of the sample variance
s2 = s1 1 for samples from a univariate normal population. It also plays a basic role
in the multivariate case.

Result 4.7. Let X be distributed as Np1M, 2 with ƒ  ƒ 7 0. Then


(a) 1X - M2¿ -11X - M2 is distributed as x2p , where x2p denotes the chi-square
distribution with p degrees of freedom.
(b) The Np1M, 2 distribution assigns probability 1 - a to the solid ellipsoid
5x: 1x - M2¿ -11x - M2 … x2p1a26, where x2p1a2 denotes the upper 1100a2th
percentile of the x2p distribution.

Proof. We know that x2p is defined as the distribution of the sum Z21 + Z22 + Á + Z2p ,
where Z1 , Z2, Á , Zp are independent N10, 12 random variables. Next, by the
spectral decomposition [see Equations (2-16) and (2-21) with A = , and see
Result 4.1], -1 = a
p
1
eieiœ , where ei = liei , so -1 ei = 11>li2ei . Consequently,
i=1 li

1X - M2¿ -11X - M2 = a 11>li21X - M2¿ eieiœ1X - M2 = a 11>li2 1eiœ1X - M22 =


p p
2

i=1 i=1

a 711> 1li2 ei 1X - M28 = a Zi , for instance. Now, we can write Z = A1X - M2,
p p
œ 22
i=1 i=1

163
164 Chapter 4 The Multivariate Normal Distribution

where
1 œ
e1
1l1
Z1
1 œ
Z2 e2
Z = D T, A = G 1l2 W
1p * 12 o 1p * p2
o
Zp
1
epœ
1lp

and X - M is distributed as Np10, 2. Therefore, by Result 4.3, Z = A1X - M2 is


distributed as Np10, AA¿2, where

1 œ
e1
1l1
1 œ
A  A¿ = G 1l2 W B a li ei eiœ R B
p
e2 1 1 1
e1 e2 Á ep R
1p * p21p * p21p * p2 i=1 1l1 1l2 1lp
o
1
epœ
1lp

1l1 e1œ
1l2 e2œ 1 1 Á 1
= D TB e1 e2 ep R = I
o 1l1 1l2 1lp
1lp epœ

By Result 4.5, Z1 , Z2, Á , Zp are independent standard normal variables, and we


conclude that 1X - M2¿ -11X - M2 has a x2p-distribution.
For Part b, we note that P71X - M2¿ -11X - M2 … c28 is the probability as-
signed to the ellipsoid 1X - M2¿ -11X - M2 … c2 by the density Np1M, 2. But
from Part a, P71X - M2¿ -11X - M2 … x2p1a28 = 1 - a, and Part b holds. 

Remark: (Interpretation of statistical distance) Result 4.7 provides an interpreta-


tion of a squared statistical distance. When X is distributed as Np1M, 2,
1X - M2¿ -11X - M2

is the squared statistical distance from X to the population mean vector M. If one
component has a much larger variance than another, it will contribute less to the
squared distance. Moreover, two highly correlated random variables will contribute
less than two variables that are nearly uncorrelated. Essentially, the use of the in-
verse of the covariance matrix, (1) standardizes all of the variables and (2) elimi-
nates the effects of correlation. From the proof of Result 4.7,

1X - M2¿ -11X - M2 = Z21 + Z22 + Á + Z2p

164
The Multivariate Normal Density and Its Properties 165

1 1
- -
In terms of  2 (see (2-22)), Z =  2 1X - M2 has a Np10, I p2 distribution, and

1 1
- -
1X - M2¿ -11X - M2 = 1X - M2¿  2  2 1X - M2

= Z¿ Z = Z21 + Z22 + Á + Z2p

The squared statistical distance is calculated as if, first, the random vector X were
transformed to p independent standard normal random variables and then the
usual squared distance, the sum of the squares of the variables, were applied.
Next, consider the linear combination of vector random variables

c1X 1 + c2X 2 + Á + cnX n = 7X1  X2  Á  Xn8 c (4-10)


1p * n2 1n * 12

This linear combination differs from the linear combinations considered earlier in
that it defines a p * 1 vector random variable that is a linear combination of vec-
tors. Previously, we discussed a single random variable that could be written as a lin-
ear combination of other univariate random variables.

Result 4.8. Let X 1 , X 2, Á , X n be mutually independent with X j distributed as


Np1M j , 2. (Note that each X j has the same covariance matrix .) Then

V1 = c1 X 1 + c2 X 2 + Á + cnX n

is distributed as Np a a cjM j , a a c2j b  b . Moreover, V1 and V2 = b1 X 1 + b2 X 2


n n

j=1 j=1

± Á + bnX n are jointly multivariate normal with covariance matrix

a a c2j b 
n
1b¿ c2 
j=1
D T
a a b2j b 
n
1b¿ c2 
j=1

Consequently, V1 and V2 are independent if b¿ c = a cj bj = 0.


n

j=1

Proof. By Result 4.5(c), the np component vector

7X1 1, Á , X1 p , X21, Á , X2 p, Á , Xn p8 = 7X 1œ , X 2œ , Á , X nœ 8 = Xœ
11 * np2

is multivariate normal. In particular, X is distributed as Nn p1M, x2, where


1np * 12

M1  0 Á 0
M2 0  Á 0
M = D T and x = D T
1np * 12 o 1np * np2 o o ∞ o
Mn 0 0 Á 

165
166 Chapter 4 The Multivariate Normal Distribution

The choice
c1I c2 I Á cnI
A = B R
12p * np2 b1I b2I Á bnI

where I is the p * p identity matrix, gives

a cjX j
n

V1
AX = D j =n 1 T = B R
a bjX j
V2
j=1

and AX is normal N2 p1AM, AxA¿2 by Result 4.3. Straightforward block multipli-


cation shows that AxA¿ has the first block diagonal term

7c1 , c2, Á , cn8 7c1I, c2I, Á , cnI8¿ = a a c2j b 


n

j=1

The off-diagonal term is

7c1, c2 , Á , cn8 7b1I, b2I, Á , bnI8¿ = a a cj bj b 


n

j=1

This term is the covariance matrix for V1 , V2 . Consequently, when a cjbj =


n

j=1

b¿ c = 0, so that a a cj bj b =
n
0 , V1 and V2 are independent by Result 4.5(b). 
j=1 1p * p2

For sums of the type in (4-10), the property of zero correlation is equivalent to
requiring the coefficient vectors b and c to be perpendicular.

Example 4.8 (Linear combinations of random vectors) Let X 1 , X 2 , X 3 , and X 4 be


independent and identically distributed 3 * 1 random vectors with
3 3 -1 1
M = C -1 S and  = C - 1 1 0S
1 1 0 2
We first consider a linear combination a¿ X 1 of the three components of X 1 . This is a
random variable with mean
a¿ M = 3a1 - a2 + a3
and variance
a¿  a = 3a21 + a22 + 2a23 - 2a1a2 + 2a1 a3
That is, a linear combination a¿ X 1 of the components of a random vector is a single
random variable consisting of a sum of terms that are each a constant times a variable.
This is very different from a linear combination of random vectors, say,
c1X 1 + c2X 2 + c3 X 3 + c4 X 4

166
The Multivariate Normal Density and Its Properties 167

which is itself a random vector. Here each term in the sum is a constant times a
random vector.
Now consider two linear combinations of random vectors
1 1 1 1
X1 + X2 + X3 + X4
2 2 2 2
and
X 1 + X 2 + X 3 - 3X 4

Find the mean vector and covariance matrix for each linear combination of vectors
and also the covariance between them.
By Result 4.8 with c1 = c2 = c3 = c4 = 1>2, the first linear combination has
mean vector

6
1c1 + c2 + c3 + c42M = 2M = C - 2 S
2

and covariance matrix

3 -1 1
1c21 + c22 + c23 + c242  = 1 *  = C - 1 1 0S
1 0 2

For the second linear combination of random vectors, we apply Result 4.8 with
b1 = b2 = b3 = 1 and b4 = - 3 to get mean vector

0
1b1 + b2 + b3 + b42M = 0M = C 0 S
0

and covariance matrix

36 - 12 12
1b21 + b22 + b23 + b242  = 12 *  = C - 12 12 0 S
12 0 24

Finally, the covariance matrix for the two linear combinations of random vectors is

0 0 0
1c1 b1 + c2 b2 + c3b3 + c4b42  = 0 = C 0 0 0S
0 0 0

Every component of the first linear combination of random vectors has zero
covariance with every component of the second linear combination of random vectors.
If, in addition, each X has a trivariate normal distribution, then the two linear
combinations have a joint six-variate normal distribution, and the two linear combi-
nations of vectors are independent. ■

167

You might also like