0% found this document useful (0 votes)
29 views

Lec 2

Lecture notes

Uploaded by

Govind Mg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Lec 2

Lecture notes

Uploaded by

Govind Mg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Lecture 2.

The Wishart distribution


In this lecture, we define the Wishart distribution, which is a family of distributions for
symmetric positive definite matrices, and show its relation to Hotelling’s T 2 statistic.

2.1 The Wishart distribution


The Wishart distribution is a family of distributions for symmetric positive definite matrices.
Let X 1 , . . . , X n be independent Np (0, Σ) and form a p × n P data matrix X = [X 1 , . . . , X n ].
The distribution of a p × p random matrix M = XX = ni=1 X i X 0i is said to have the
0

Wishart distribution.
Definition 1. The random matrix M(p×p) = ni=1 X i X 0i has the Wishart distribution with
P
n degrees of freedom and covariance matrix Σ and is denoted by M ∼ Wp (n, Σ). For n ≥ p,
the probability density function of M is
1 1
f (M) = np/2 n n/2
|M|(n−p−1)/2 exp[− trace(Σ−1 M)],
2 Γp ( 2 )|Σ| 2
with respect to Lebesque measure on the cone of symmetric positive definite matrices. Here,
Γp (α) is the multivariate gamma function.

The precise form of the density is rarely used. Two exceptions are that i) in Bayesian
computation, the Wishart distribution is often used as a conjugate prior for the inverse of
normal covariance matrix and that ii) when symmetric positive definite matrices are the
random elements of interest in diffusion tensor study.
The Wishart distribution is a multivariate extension of χ2 distribution. In particular, if
M ∼ W1 (n, σ 2 ), then M/σ 2 ∼ χ2n . For a special case Σ = I, Wp (n, I) is called the standard
Wishart distribution.
Proposition 1. i. For M ∼ Wp (n, Σ) and B(p×m) , B0 MB ∼ Wm (n, B0 ΣB).
1 1
ii. For M ∼ Wp (n, Σ) with Σ > 0, Σ− 2 MΣ− 2 ∼ Wp (n, Ip ).
Pk
iii. If Mi are independent Wp (ni , Σ) (i = 1, . . . , k), then i=1 Mi ∼ Wp (n, Σ), where
n = n1 + . . . + nk .
iv. For Mn ∼ Wp (n, Σ), EMn = nΣ.
v. If M1 and M2 are independent and satisfy M1 + M2 = M ∼ Wp (n, Σ) and M1 ∼
Wp (n1 , Σ) then M2 ∼ Wp (n − n1 , Σ).

The law of large numbers and the Cramer-Wold device leads to Mn /n → Σ in probability
as n → ∞.
Corollary 2. If M ∼ Wp (n, Σ) and a ∈ Rp is such that a0 Σa 6= 0∗ , then
a0 Ma
∼ χ2n .
a0 Σa

The condition a0 Σa 6= 0 is the same as a 6= 0 if Σ > 0.
Theorem 3. If M ∼ Wp (n, Σ) and a ∈ Rp and n > p − 1, then

a0 Σ−1 a
∼ χ2n−p+1 .
a0 M−1 a

The previous theorem holds for any deterministic a ∈ Rp , thus holds for any random
a provided that the distribution of a is independent of M. This is important in the next
subsection.
The following lemma is useful in a proof of Theorem 3
   11 
A11 A12 −1 A A12
Lemma 4. For A = invertible, we have A = , where
A21 A22 A21 A22

A11 = (A11 − A12 A−1 −1


22 A21 ) ,
A12 = −A11 A12 A−1
22 ,
A21 = −A−1 11
22 A21 A ,
A22 = (A22 − A21 A−1 −1
11 A12 ) .

2
2.2 Hotelling’s T 2 statistic
Definition 2. Suppose X and S are independent and such that

X ∼ Np (µ, Σ), mS ∼ Wp (m, Σ).

Then
Tp2 (m) = (X − µ)T S−1 (X − µ)
is known as Hotelling’s T 2 statistic.

Hotelling’s T 2 statistic plays a similar role in multivariate analysis to that of the student’s
t-statistic in univariate statistical analysis. That is, the application of Hotelling’s T 2 statistic
is of great practical importance in testing hypotheses about the mean of a multivariate normal
distribution when the covariance matrix is unknown.
Theorem 5. If m > p − 1,
m−p+1 2
Tp (m) ∼ Fp,m−p+1 .
mp

A special case is when p = 1, where Theorem 5 indicates that T12 (m) ∼ F1,m . Where is
the connection of the Hotelling’s T 2 statistic to the student’s t-distribution?
Note that we are indeed abusing the definition of ‘statistic’ here.

2.3 Samples from a multivariate normal distribution


Suppose X 1 , . . . , X n are i.i.d. Np (µ, Σ). Denote the sample mean and sample variance by
n
1X
X̄ = X i,
n i=1
n
1 X
S = (X i − X̄)(X i − X̄)0 .
n − 1 i=1

Theorem 6. X̄ and S are independent, with



n(X̄ − µ) ∼ Np (0, Σ),
(n − 1)S ∼ Wp (n − 1, Σ).

Corollary 7. The Hotelling’s T 2 statistic for MVN sample is defined as

T 2 (n − 1) = n(X̄ − µ)0 S−1 (X̄ − µ),

and we have
n−p n
(X̄ − µ)0 S−1 (X̄ − µ) ∼ Fp,n−p .
p n−1

3
(Incomplete) proof of Theorem 6. First note the following decomposition:
X X
(X i − µ)(X i − µ)0 = n(X̄ − µ)(X̄ − µ)0 + (X i − X̄)(X i − X̄)0 ;
i i

and recall the definition of the Wishart distribution.



It is easy to check the result on n(X̄ − µ). The following argument is for the indepen-
dence and the distribution of S.
Consider a new set of random vectors Y i (i = 1, . . . , n) from a linear combination of X i s
by an orthogonal matrix D satisfying
1
D = [d1 , . . . , dn ], d1 = √ 1n ,
n
DD = D0 D = In .
0

Let n n
X X
Yj = (X i − µ)dji = X̃ i dji = X̃dj ,
i=1 i=1

where X̃ = [X̃ 1 , . . . , X̃ n ] is the p × n matrix of de-meaned random vectors. We claim the


following:

1. Y j are normally distributed.



0 Σ, if j = k;
2. E(Y j Y k ) =
0, if j 6= k.
Pn Pn 0
3. i=1 Y i Y 0i = i=1 X̃ i X̃ i .

4. Y 1 Y 01 = n(X̄ − µ)(X̄ − µ)0 .

The facts 1 and 2 show the independence between Y 1 and Y j (j ≥ 2).


Pn 0 Pn 0 0
Pn Note that in the above decomposition: LHS = i=1 X̃ i X̃ i = i=1 Y i Y i = Y 1 Y 1 +
0 0
P n 0 P n 0 P 0
i=2 Y i Y i = n(X̄ − µ)(X̄ − µ) + i=2 Y i Y i . Hence i=2 Y i Y i = i (X i − X̄)(X i − X̄)
which is independent of Y 1 . This also gives us the distribution of S

Next lecture is on the inference about the multivariate normal distribution.

You might also like