Comparisons_of_Several_Multivariate_Means (1)
Comparisons_of_Several_Multivariate_Means (1)
t= D̄−δ
√
sd / n
∼ tn−1
∑n ∑n
where D̄ = 1
n j=1 Dj and s2d = 1
n−1 j=1 (Dj − D̄)2
• An α-level test of H0 : δ = 0 v.s. H1 : δ ̸= 0 may be conducted by comparing |t| with
tn−1 (α/2). A 100(1 − α)% confidence interval for the mean difference δ is given by
d¯ − tn−1 (α/2) √
sd
n
≤ δ ≤ d¯ + tn−1 (α/2) √
sd
n
• 100(1 − α)% simultaneous confidence intervals for the individual mean differ-
ences δi are given by
√ √
(n − 1)p s2di
d¯i ± Fp,n−p (α)
n−p n
where d¯i is the ith element of d¯ and s2di is the ith diagonal element of Sd .
1
• The Bonferroni 100(1−α)% simultaneous confidence intervals for the individual
mean differences δi are √
α s2di
d¯i ± tn−1 ( )
2p n
α
where tn−1 ( 2p ) is the upper 100(α/2p)th percentile of a t-distribution with n-1
d.f.
Example: Municipal wastewater treatment plants are required by law to monitor their
discharges into rivers and streams on a regular basis. Concern about the reliability
of data from one of these self-monitoring programs led to a study in which samples
of effluent were divided and sent to two laboratories for testing. One half of each
sample was sent to the Wisconsin State Laboratory of Hygiene, and one-half was sent
to a commercial laboratory routinely used in the monitoring program. Measurements
of biochemical oxygen demand (BOD) and suspended solid (SS) were obtained, for
n = 11 sample splits, from the two laboratories. The data are as following:
Commercial lab State lab of hygiene
Sample j x1j1 (BOD) x1j2 (SS) x2j1 (BOD) x2j2 (SS)
1 6 27 25 15
2 6 23 28 13
3 18 64 36 22
.. .. .. .. ..
. . . . .
11 20 14 39 21
Discussion:
• Randomized assignment of treatments can enhance the statistical analysis
• The name repeated measures stems from the fact that all treatments are adminis-
tered to each unit.
• For comparative purposes, we consider contrasts of the components of µ =
E(Xi ). These could be
2
µ1 − µ2 1 −1 0 ··· 0 µ1
µ1 − µ3 1 0 −1 · · · 0
µ2
.. = .. .. .. .. .. .. = C1 µ
. . . . . . .
µ1 − µq 1 0 0 · · · −1 µq
or
µ2 − µ1 −1 1 0 ··· 0 0 µ1
µ3 − µ2 0 −1 1 ··· 0 0 µ2
.. = .. .. .. .. .. .. = C2 µ
. . . . . . .
µq − µq−1 0 0 0 ··· −1 1 µq
both C1 and C2 are called contrast matrices, because their q − 1 rows are linearly
independent and each is a contrast vector.
• Consider an Nq (µ, Σ) population, and let C be a contrast matrix. An α-level test of
H0 : Cµ = 0 v.s. H1 : Cµ ̸= 0 is: Reject H0 if
(n − 1)(q − 1)
T 2 = n(C X̄)′ (CSC ′ )−1 (C X̄) > Fq−1,n−q+1 (α)
(n − q + 1)
where x̄ and Σ are the sample mean vector and covariance matrix defined by Xj ’s.
• A confidence region for contrasts Cµ is
(n − 1)(q − 1)
n(C X̄ − Cµ)′ (CSC ′ )−1 (C X̄ − Cµ) ≤ Fq−1,n−q+1 (α)
(n − q + 1)
• simultaneous 100(1 − α)% confidence intervals for a single contrast c′ µ for any contrast
vectors are given by:
√ √
′ (n − 1)(q − 1) c′ Sc
c x̄ ± Fq−1,n−q+1 (α)
(n − q + 1) n
3
2 Comparing Mean Vectors From Two Populations
• consider a random sample of size n1 from population 1 and a sample of size n2
from population 2
• the observations on p variables from population 1 and 2 are [x11 , x12 , · · · , x1n1 ]′
and [x21 , x22 , · · · , x2n2 ]′
• question: µ1 − µ2 = δ0 v.s. µ1 − µ2 ̸= δ0
∑nk ∑nk
j=1 (xkj −
1 1
• the sample statistics from two populations are x¯k = nk j=1 xkj , Sk = nk −1
x¯k )(xkj − x¯k )′ , k = 1, 2
• Assume:
Then we get
• X̄1 − X̄2 is the MLE of δ = µ1 − µ2 , the MLE of Σ is Spooled = n1 +n2 −2 [(n1 −
1
4
3 Simultaneous Confidence Intervals
Result:
Let c2 = [(n1 + n2 − 2)p/(n1 + n2 − p − 1)]Fp,n1 +n2 −p−1 (α) then for all a ̸= 0.
√
′ ′ 1 1 ′
P (a (µ1 − µ2 ) ∈ a (x¯1 − x¯2 ) ± c ( + )a Spooled a) = 1 − α
n1 n2
in particular, for i = 1, 2 · · · , p
( √ )
1 1
µ1i − µ2i ∈ (x¯1i − x¯2i ) ± c ( + )Sii,pooled = 1 − α
n1 n2
where χ2p (α) is the upper (100α)th percentile of a chi-square distribution with p d.f.
Also, 100(1−α)% simultaneous confidence intervals for all linear combinations a′ (µ1 − µ2 )
are provided by
( √ )
√
′ ′ 1 1
P a (µ1 − µ2 ) ∈ a (x¯1 − x¯2 ) ± χ2p (α) a′ ( + S2 )a =1−α
S1 S2
5
4.2 One-Way ANOVA
Motivated by the decomposition in Xℓj , the analysis of variance is based upon an
analogous decomposition of the observations:
∑
g
∑
nℓ
∑
g
(xℓj − x̄)(xℓj − x̄)′ = nℓ (x̄ℓ − x̄)(x̄ℓ − x̄)′
ℓ=1 j=1 ℓ=1
∑
g
∑
nℓ
+ (xℓj − x̄ℓ )(xℓj − x̄ℓ )′ = B + W
ℓ=1 j=1
6
One test of H0 : τ1 = · · · = τg = 0 is to reject H0 if Wilk’s Λ∗ is too small
|W |
Λ∗ =
|B + W |
Distribution of Wilk’s Λ
No. of variables No. of groups Sampling distribution
∗
p=1 g≥2 ( n−g
g−1
)( 1−Λ
Λ∗
) ∼ Fg−1,n−g
√
∗
p=2 g≥2 ( n−g−1
g−1
√ Λ ) ∼ F2(g−1),2(n−g−1)
)( 1− Λ∗ ∗
p≥1 g=2 n−p−1
( p )( 1−Λ Λ∗
) ∼ Fp,n−p−1
√
∗
p≥1 g=3 ( n−p−2
p
√ Λ ) ∼ F2p,2(n−p−2)
)( 1− Λ∗
When n is large,
p+g
)lnΛ∗ ∼ χ2p(g−1)
a
−(n − 1 −
2
ℓ = 1, · · · , g, k = 1, · · · , b, r = 1, · · · , n
∑ g
∑
b ∑
g
∑
b
τℓ = βk = γℓk = γℓk = 0, eℓkr i.i.d. ∼ N (0, σ 2 )
ℓ=1 k=1 ℓ=1 k=1
7
ANOVA for comparing effects of two factors and their interaction
Source of variation Sum of squares d.f.
∑g
Factor 1 SSf ac1 = bn(x̄ℓ· − x̄)2 g-1
∑bℓ=1 2
Factor 2 SSf ac2 =
∑ k=1 gn(x̄·k − x̄) b-1
Interaction SSint = n(x̄ℓk − x̄ℓ· − x̄·k + x̄) 2 (g − 1)(b − 1)
ℓ,k∑ 2
Residual SSres = ℓ,k,r (xℓkr − x̄ℓk ) gb(n − 1)
∑ 2
Total SScor = ℓ,k,r (xℓkr − x̄) gbn-1
• The F-ratio of the mean squares, SSf ac1 /(g − 1), SSf ac2 /(b − 1), SSint /(g − 1)(b − 1)
to the mean squares, SSres /(gb(n − 1)) can be used to test for the effects of factor 1,
factor 2 and factor1-factor2 interaction, respectively.
ℓ = 1, · · · , g, k = 1, · · · , b, r = 1, · · · , n
∑ g
∑
b ∑
g
∑
b
τℓ = βk = γℓk = γℓk = 0, eℓkr i.i.d. ∼ Np (0, Σ)
ℓ=1 k=1 ℓ=1 k=1
|SSPres |
Λ∗ =
|SSPint + SSPres |
For large samples, reject H0 at α level if
p + 1 − (g − 1)(b − 1)
−[gb(n − 1) − ]lnΛ∗ > χ2(g−1)(b−1)p (α)
2
|SSPres |
Λ∗ = |SSPf ac1 +SSPres |
−[gb(n − 1) − p+1−(g−1)
2
]lnΛ∗ > χ2(g−1)p (α)
8
• A test of H0 : β1 = · · · = βb = 0 v.s. H1 : at least one βℓk ̸= 0 is conducted by
rejecting H0 for small values of the ratio:
|SSPres |
Λ∗ = |SSPf ac2 +SSPres |
−[gb(n − 1) − p+1−(b−1)
2
]lnΛ∗ > χ2(b−1)p (α)
• The 100(1 − α)% simultaneous confidence intervals for τℓi − τmi are
√
α Eii 2
(x̄ℓ·i − x̄m·i ) ± tν ( )
pg(g − 1) ν bn
• The 100(1 − α)% simultaneous confidence intervals for βki − βqi are
√
α Eii 2
(x̄·ki − x̄·qi ) ± tν ( )
pb(b − 1) ν gn