16-Two-Sample-T-tests
16-Two-Sample-T-tests
The z-test
How can we make evidence based decisions? Is an observed result due to
chance or something else? How can we test whether a population has a certain
proportion?
The t-test
How can we test whether an unkown population has a certain mean?
𝜒 2 -test
How to compare frequencies of categories?
2/40
Today’s outline
3/40
Exam marks: bootstrap simulation
Exam marks: bootstrap simulation
· What about a simulation? We repeatedly sample from the data, and compute the
value taken by the t-statistic.
n = length(marks)
sig.hat = sd(marks)
t = (mean(marks) - 65)/(sig.hat/sqrt(n)) # obtaining the original t statistic
t
## [1] -2.371497
5/40
Exam marks: bootstrap simulation
hist(T.stats.sim, breaks = 50, pr = T)
curve(dt(x, df = n - 1), add = T, lty = 2)
legend("topright", legend = c("Student's t-dist. with 99 d.f."), lty = 2)
6/40
Exam marks: bootstrap p-value
· How significant is our observed t-statistic value of -2.371 , based on the
simulation?
· What proportion of the values in T.stats.sim exceed abs(t)= 2.371 (in
absolute value)?
## [1] 0.0209
7/40
Exam marks: bootstrap confidence interval
· We firstly get the upper and lower 2.5% points from T.stats.sim :
## 97.5% 2.5%
## 2.022866 -1.966097
These are then used to construct the interval [𝑋¯ − 𝑢 𝜎𝑛̂ , 𝑋¯ − ℓ 𝜎𝑛̂ ]:
·
√ √
## 97.5% 2.5%
## 60.29340 64.56579
8/40
Reflecting on the three different methods
· The p-values of each test:
9/40
Comparing two sample means
Red bull example
· Red Bull is an energy drink advertised to “give you wings”.
· What does research say about the medical effects of drinking a Red Bull?
· Consider the following data on heart rates (beats per minute), for 2 independent
groups of Sydney students, collected 20 minutes after the ‘RedBull’ group had
drunk a 250ml cold can of Red Bull.
No Red Bull 84 76 68 80 64 62 74 84 68 96 80 64 65 66
Red Bull 72 88 72 88 76 75 84 80 60 96 80 84 - -
11/40
Red bull example
No_RB <- c(84, 76, 68, 80, 64, 62, 74, 84, 68, 96, 80, 64, 65, 66)
RB <- c(72, 88, 72, 88, 76, 75, 84, 80, 60, 96, 80, 84)
boxplot(No_RB, RB, names = c("No RB", "RB"), horizontal = T)
12/40
Two-box model
· We can model the two groups as samples taken from two separate boxes
(independently of each other).
· the No Red Bull group is considered as a random sample 𝑋1 , … , 𝑋𝑚 with
replacement from a box with:
- mean 𝜇𝑋 and
- SD 𝜎𝑋
· the Red Bull group is considered as a random sample 𝑌1 , … , 𝑌𝑛 with
replacement from a box with:
- mean 𝜇𝑌 and
- SD 𝜎𝑌
· We wish to make a statement about the population mean difference 𝜇𝑋 and 𝜇𝑌 ,
based on the sample mean difference 𝑋¯ − 𝑌¯ .
13/40
Expected value and SE of 𝑋¯ − 𝑌¯
· 𝐸(𝑋¯ − 𝑌¯ ) = 𝐸(𝑋¯ ) + 𝐸(−𝑌¯ ) = 𝐸(𝑋¯ ) − 𝐸(𝑌¯ ) = 𝜇𝑋 − 𝜇𝑌
· where the second equality follows from 𝐸(𝑎𝑋) = 𝑎𝐸(𝑋) for a random draw 𝑋 .
· Most importantly
𝜎 2 𝜎 2
𝑆𝐸(𝑋¯ − 𝑌¯ )2 = 𝑆𝐸(𝑋¯ )2 + 𝑆𝐸(−𝑌¯ )2 = 𝑋 + 𝑌 .
𝑚 𝑛
· where the second equality follows from 𝑆𝐸(−𝑋) = 𝑆𝐸(𝑋) for a random draw
𝑋.
14/40
Two-sample test statistic
· We wish to test the null hypothesis 𝐻0 : 𝜇𝑋 = 𝜇𝑌 .
· An observed z-statistic is usually given by
observed − expectation
𝑧=
standard error
where the expectation and standard error are taken assuming 𝐻0 is true.
· The observation is the observed mean sample difference 𝑥¯ − 𝑦¯ .
· The expectation is the difference 𝜇𝑋 − 𝜇𝑌 = 0 under 𝐻0 .
· ‾𝜎‾‾‾‾‾‾
𝜎𝑌2‾
The standard error is √ 𝑚 + 𝑛
2
𝑋
· This gives us
𝑥¯ − 𝑦¯
𝑧=
‾𝜎‾‾‾‾‾‾ 𝜎𝑌2‾
√𝑚 + 𝑛
2
𝑋
15/40
Two-sample test statistic
· Assuming 𝜎𝑋 and 𝜎𝑌 is known, the Z-statistic is distributed
𝑋¯ − 𝑌¯
𝑍= ∼ 𝑁(0, 1)
‾𝜎‾‾‾‾‾‾𝜎𝑌‾
√𝑚
2 2
𝑋
+ 𝑛
16/40
The classical two-sample t-test
Equal variance assumption
· In some cases it is reasonable to assume 𝜎𝑋 = 𝜎𝑌 = 𝜎
2 = 𝜎2 .
- This is often called an equal variances assumption, i.e. 𝜎𝑋 𝑌
· Then the SE may be written as
‾1‾‾‾‾‾1‾
√𝑚
𝑆𝐸(𝑋¯ − 𝑌¯ ) = 𝜎 + .
𝑛
18/40
Extra assumptions: Student’s 𝑡-distribution
· In this case, if
- it is also assumed the boxes are (approx.) normal-shaped,
- a special “combined” or “pooled” estimate 𝜎̂𝑝 of the common 𝜎 is used
then Student’s theory can be applied to show the statistic
𝑋¯ − 𝑌¯
𝑇 = ∼ 𝑡𝑚+𝑛−2
𝜎̂𝑝 √‾𝑚1‾‾‾‾‾
+ 1𝑛‾
19/40
The pooled estimate 𝜎̂𝑝
· The form of the pooled estimate of 𝜎 is given by
‾∑
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
𝑚 ¯ 2 + ∑𝑛 (𝑌 − 𝑌¯ )2‾
√
𝑖=1 (𝑋 𝑖 − 𝑋 ) 𝑗=1 𝑗
𝜎̂𝑝 =
𝑚+𝑛−2
‾(𝑚
‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
− 1)𝜎̂2𝑋 + (𝑛 − 1)𝜎̂2𝑌‾
√
= .
𝑚+𝑛−2
20/40
2
Squared estimate 𝜎̂𝑝 is “on target” for 𝜎 2
· Recall that each sample variance (squared sample SD) estimates 𝜎 2 “on
average”, in that
(𝑚 − 1 ∑ )
1
2
𝐸(𝜎̂𝑋 ) = 𝐸 (𝑋𝑖 − 𝑋¯ )2 = 𝜎 2
𝑖=1
and so
(∑ )
𝐸 ((𝑚 − 1)𝜎̂2𝑋 ) = 𝐸 (𝑋𝑖 − 𝑋¯ )2 = (𝑚 − 1)𝜎 2
𝑖=1
· Similarly we have
(∑ )
𝐸 ((𝑛 − 1)𝜎̂2𝑌 ) = 𝐸 (𝑌𝑖 − 𝑌¯ )2 = (𝑛 − 1)𝜎 2
𝑖=1 21/40
2
Squared estimate 𝜎̂𝑝 is “on target” for 𝜎 2
· Then the numerator inside the √⋅ has
𝑚 𝑛
(∑ )
(𝑋𝑖 − 𝑋¯ )2 + (𝑌𝑗 − 𝑌¯ )2 = 𝐸 ((𝑚 − 1)𝜎̂2𝑋 ) + 𝐸 ((𝑛 − 1)𝜎̂2𝑌 )
∑
𝐸
𝑖=1 𝑗=1
= (𝑚 − 1)𝜎 2 + (𝑛 − 1)𝜎 2
= (𝑚 + 𝑛 − 2)𝜎 2 .
· Dividing through by 𝑚 + 𝑛 − 2 we get
𝐸 (𝜎̂2𝑝 ) = 𝜎 2 ,
2 2 2
so 𝜎̂𝑝 shares the “on-target on average” property that 𝜎̂𝑋 and 𝜎̂𝑌 have.
· Hence 𝑚 + 𝑛 − 2 is the number of degrees of freedom for the pooled estimate
2
of the variance, 𝜎̂𝑝 .
22/40
Red Bull example
· Based on the boxplots, we see that
- each looks reasonably symmetric;
- the spreads are similar
· It may therefore be reasonable to assume we have samples obtained from
approximate normal boxes with a common SD.
sd(No_RB)
## [1] 10.07363
sd(RB)
## [1] 9.452833
23/40
Red Bull example: pooled estimate
m = length(No_RB)
n = length(RB)
print(c(m, n))
## [1] 14 12
## [1] 9.793984
24/40
Red Bull example: test statistic
· We therefore compute the value taken by the (Classical) Two-Sample T-statistic:
## [1] 3.852933
## [1] -5.940476
stat = mean.diff/est.SE
stat
## [1] -1.541806
25/40
Red Bull example: p-value
· Is this a one- or two-sided test?
· As originally phrased, i.e. “is the apparent difference significant?”, it is (strictly
speaking) two-sided.
· A two-sided P-value is thus
2 * pt(abs(stat), df = m + n - 2, lower.tail = F)
## [1] 0.1362041
26/40
Using t.test()
· Of course, the t.test() function can do all of this in one line;
- we must supply the var.equal=T parameter:
##
## Two Sample t-test
##
## data: No_RB and RB
## t = -1.5418, df = 24, p-value = 0.1362
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.892538 2.011586
## sample estimates:
## mean of x mean of y
## 73.64286 79.58333
27/40
Confidence interval
· Note that the confidence interval given here is obtained in the familiar way.
· Instead of referencing a normal curve and using qnorm() we use qt() .
· Specifically, we use 𝑚 + 𝑛 − 2 degrees of freedom:
qt(0.975, df = m + n - 2)
## [1] 2.063899
𝜎̂𝑝
(𝑥¯ − 𝑦¯) ± 𝑞 ⋅
√𝑛
28/40
The Welch Test
Relaxing the equal variance assumption
· Do we really need to assume 𝜎𝑋 = 𝜎𝑌 ?
- Yes, if we want to apply Student’s theory directly.
· What if it is not reasonable to assume this?
- E.g. the two boxplots may have very different spread.
· An “obvious” approach would be to instead consider the statistic
𝑋¯ − 𝑌¯
𝑇 = ,
‾𝜎‾‾‾‾‾‾
2
𝜎𝑌̂‾
2
√𝑚
̂
𝑋
+ 𝑛
30/40
Welch’s paper
· In 1947 (some time after Student’s paper) B. L. Welch “solved” this problem:
31/40
Approximate Student’s-𝑡 distribution
· Welch found that the statistic behaved approximately like a student’s-𝑡 distribution
whose degrees of freedom was a complicated function of 𝑚 , 𝑛 , 𝜎𝑋 and 𝜎𝑌 .
· He also proposed implementing the test by “plugging in” 𝜎̂𝑋 and 𝜎̂𝑌 .
· The Welch Test thus obtains a p-value, etc. by imagining the statistic 𝑇 has a
Student’s-𝑡 distribution with a data-dependent degrees of freedom.
32/40
The formula for the degrees of freedom at a
glance!
(𝑚 )
2 2
̂
𝜎𝑋 𝜎2𝑌̂
+ 𝑛
𝑑𝑓 = 2
̂ /𝑚)2
(𝜎 𝑋 (𝜎2𝑦̂ /𝑛)2
𝑚−1 + 𝑛−1
33/40
Default two-sample t.test()
· It turns out Welch’s procedure works very well,
- i.e. the “approximate” p-values returned have nice properties
- rejection rates are in line with the desired false-alarm rate when simulating
from normal boxes.
· It works so well that 𝚁 uses the Welch test as the default two-sample t-test:
t.test(No_RB, RB) # note: data-dependent d.f. close to Classical (which was 24 d.f.)
##
## Welch Two Sample t-test
##
## data: No_RB and RB
## t = -1.5497, df = 23.776, p-value = 0.1344
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.856127 1.975175
## sample estimates:
## mean of x mean of y
## 73.64286 79.58333
34/40
Using bootstrap simulation
· The Welch test does not assume 𝜎𝑋 = 𝜎𝑌 .
· But it does still assume the two boxes are “approximately normal”.
· What if we are uncomfortable making this assumption?
· We can try simulating from two “best guess” boxes which
- have “similar shapes” to the “true” boxes that generated our data;
- have equal means.
35/40
Centre the samples before simulating
· We thus sample from each observed sample with replacement, but we subtract the
means so both “populations” have the same mean i.e. zero.
Welch.stats.sim = 0
for (i in 1:10000) {
samp.x = sample(No_RB - mean(No_RB), size = m, replace = T) # both 'boxes' have
samp.y = sample(RB - mean(RB), size = n, replace = T) # mean zero
est.SE = sqrt((sd(samp.x)^2)/m + (sd(samp.y)^2)/n)
Welch.stats.sim[i] = (mean(samp.x) - mean(samp.y))/est.SE
}
36/40
The histogram
hist(Welch.stats.sim, n = 50, pr = T)
curve(dt(x, df = 23.776), add = T, lty = 2) # data-dependent d.f. from original sample
legend("topleft", legend = c("Students-t with 23.776 d.f."), lty = 2)
37/40
The histogram
· The histogram is a little skewed.
- This is maybe due to the large positive near-outlier in the No_RB sample.
- Simulated samples without that value chosen will have a smaller mean, giving
a cluster of statistic values less than zero.
38/40
Two-sided p-value by simulation
est.SE = sqrt((sd(No_RB)^2)/m + (sd(RB)^2)/n)
stat = (mean(No_RB) - mean(RB))/est.SE
stat
## [1] -1.549672
## [1] 0.1361
39/40
Confidence interval by simulation
· We use the simulated values in Welch.stats.sim to approximate the “true
distribution” of the Welch statistic when 𝜇𝑋 = 𝜇𝑌 :
## 97.5% 2.5%
## 1.878815 -2.285400
· That these are not the same magnitude indicates the slight lack of symmetry.
## 97.5% 2.5%
## -13.142681 2.820322
· The interval is quite close to those obtained by (both versions of) t.test() , but
is slightly shifted to the right;
- indicates influence of the large positive value in No_RB .
40/40