Stat 491 Chapter 8 - Hypothesis Testing - Two Sample Inference
Stat 491 Chapter 8 - Hypothesis Testing - Two Sample Inference
Solomon W. Harrar
The University of Montana
Fall 2012
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Two-Sample Inference
In Chapter 6 and 7, we had only one-sample.
Underlying (or p) of the population from which the sample
was drawn was compared with known mean (prevalence rate)
of the general population.
Example: Asian immigrants mean cholesterol was compared
with the general US mean cholesterol known to be 190
mg/dL.
In this chapter, we do have two samples each from a different
population.
Interest lies in comparing the underlying unknown means of
the two populations.
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Paired t Test
Let d = 1 2 .
Let n denote the number of pairs of measurements in the
sample.
Let di denote the difference between the first and second
measurement in the ith pair.
Assumption: d1 , d2 , . . . , dn constitute a random sample from
a normally distributed population with mean d and unknown
variance d2 .
We can look at Q-Q plot and Box plots of the ds to check
violation of the normality assumption.
Compute
s
Pn
n
X
2
1
i=1 (di d)
di and sd =
.
d=
n
n1
i=1
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
vs Ha : d > 0
vs Ha : d < 0
vs Ha : d 6= 0
sd / n
R.R.: For a specified value of ,
Case 1. Reject H0 if t tn1,1 .
Case 2. Reject H0 if t tn1,1 .
Case 3. Reject H0 if |t| tn1,/2 .
p-Value:
Case 1. P(t > tcomputed )
Case 2. P(t < tcomputed )
Case 3. 2 P(t > |tcomputed |) for two-sided test.
t=
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Example: Nutrition
An important hypothesis in hypertension research is that sodium
restriction may lower blood pressure. However, it is difficult to
achieve sodium restriction over the long term, and dietary
counseling in a group setting is sometimes used to achieve this
goal. The data on overnight urinary sodium excretion (mEq/8hr)
were obtained on eight individuals enrolled in a sodium-restricted
group. Data was collected at baseline
and after one week of dietary counseling. (d = 1.14 and sd = 12.22)
Person
Baseline
Week 1
di
1
7.85
9.59
-1.74
2
12.03
34.50
-22.47
3
21.84
4.55
17.29
4
13.94
20.78
-6.84
5
16.68
11.69
4.99
6
41.78
32.51
9.27
7
14.97
5.46
9.51
8
12.072
12.95
-0.88
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
|d |
)
d / n
and
(z1/2 + z1 )2
2d
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Background
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
1 X
2
The Sampling Distribution of X
12 22
+ .
n1
n2
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
1 X
2 ) (1 2 )
(X
q
s n11 + n12
where
S2 =
tn1 +n2 2
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Large-Samples Case
1 X
) (1 2 )
(X
q2 2
S22
S1
n1 + n2
N(0, 1)
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
t test in R
Inference for difference in means can be computed in R in one
of the following two ways depending on how your data is
organized.
If the two samples are entered as vectors x and y then
t.test(x,y,mu=0,paired=F,var.equal=T,
alternative="two.sided")
If the all the data form the two samples is in one vector y and
the vector x contains indicators of sample, then we use
t.test(y~x,mu=0,paired=F,var.equal=T,
alternative="two.sided")
Examples:
x=c(2.3,3.4,1.2,4.4)
y=c(3.2,1.5,2.6,3.3,4.5)
t.test(x,y,var.eual=T)
x=c(1,1,1,1,2,2,2,2,2)
y=c(2.3,3.4,1.2,4.4,3.2,1.5,2.6,3.3,4.5)
t.test(y~x,var.eual=T)
Chapter 8: Hypothesis TestingTwo-Sample Inference
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
50
60
30
30
40
Sample Quantiles
50
20
Sample Quantiles
40
10
1.5
10
20
1.0
0.5
20
0.0
0.5
1.0
1.5
1.5
1.0
0.5
0.0
0.5
Theoretical Quantiles
Theoretical Quantiles
30
40
50
20
30
40
1.0
50
1.5
60
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
vs Ha : 12 6= 22
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
S12 /12
S22 /22
Fn1 1,n2 1
where
n1
P
S12 =
n2
P
1 )2
(X1i X
i=1
n1 1
and S22 =
2 )2
(X2i X
i=1
n2 1
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
S12
S22
H0
Fn1 1,n2 1 ,
S22
S12
H0
Fn2 1,n1 1 .
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
S22
< Fn2 1,n1 1,/2 ) = /2
S12
= P(
S12
S22
1
>
F
)
=
P(
<
)
n
1,n
1,1/2
1
2
Fn1 1,n2 1,1/2
S22
S12
Therefore,
Fn2 1,n1 1,/2 =
1
Fn1 1,n2 1,1/2
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
(
x1 x2 ) (1 2 )
q 2
td
s22
s1
n1 + n2
where
d=
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Test
Reject
H0 : 21 = 22
Use Pooled
Use Welch's
t Test for
t Test for
H0 : 1 = 2
H0 : 1 = 2
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Power Analysis
= pnorm(z1/2 + q
, 0, 1)
12 /n1 + 22 /n2
For one-sided alternative, we replace /2 with .
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Suppose 100 OC users and 100 non-OC users are available for
study and a true mean difference of 1 2 = 5 mm Hg is
anticipated, with OC users having the higher mean SBP. How
much power would such a study have if estimates of the
standard deviations for OC users and non-users were obtained
from a pilot study as 15.34 mm Hg and 18.23 mm Hg,
respectively?
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Sample-Size Estimation
The appropriate sample size to have a probability of 1 of
finding a significant difference based on a two-sided test with
significance level when the absolute difference in mean
between the two groups is = |1 2 | is:
a. Equal sample sizes anticipated
n1 = n2 = (12 + 22 )
(z1/2 + z1 )2
.
2
(z1/2 + z1 )2
.
2
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
X 1 X 2
where = 1 2 .
For paired sample,
X2
1 X 2
12 22
1 2
+
2
.
n
n
n
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
Introduction
Inference about 1 2 : Paired Samples
Inference about 1 2 : Independent Samples
90
70
Nonacademic Environment
80
60
50
50
60
70
80
Academic Environment
90