ETF2121/ETF5912 Data Analysis in Business: Week 5: Estimation and Hypothesis Testing For Two Populations
ETF2121/ETF5912 Data Analysis in Business: Week 5: Estimation and Hypothesis Testing For Two Populations
Dr Wei Wei
Monash University
Let X1 and X2 denote the variables associated with the first and second
population, respectively.
The population means are µ1 and µ2 the population standard
deviations are σ1 and σ2 .
We are interested in whether µ1 is different from µ2 . In other words,
the parameter of interest is µ1 − µ2 .
We take random samples of size n1 and n2 from the two populations.
Paired samples
Paired samples: when two samples are selected in such a way that
each item in one sample has a corresponding match or related item in
the other sample.
The sample size is common in both samples: n1 = n2 = n.
For example, we may be interested in the effect of a training program
on employee productivity. If we select the same set of employees and
measure their productivity before and after the training program, we
have paired samples.
Independent samples
The key for analyzing paired samples is to take the difference between
the two variables (of the same unit) then proceed as in the one sample
case.
Let D = X1 − X2 denote the difference between the population
variables. We calculate the sample difference d = x1 − x2 for each
paired unit in the paired samples.
Unit X1 X2 D = X1 − X2
1 x11 x12 d1 = x11 − x12
2 x21 x22 d2 = x21 − x22
... ... ...
n xn1 xn2 dn = xn1 − xn2
E (d) = µD .
Let 2
denote the population variance of D. The sample variance of
σD
d, defined as
Pn 2
i=1 di − d
sd2 = Var (d) = ,
n−1
2.
is an unbiased estimator of the population variance, σD
Var (d) s2
SE (d)2 = Var (d) = = d,
n n
2
is an unbiased estimator of the estimator’s variance σD /n.
Since Var (d) approached zero as the sample size n increases, and
E (d) = µD , d is also a consistent estimator of µD .
Dr Wei Wei (Monash University) ETF2121/5912 12 / 55
Differences in Population Means Paired Samples
d − µD
√ ∼ tn−1
sd / n
Note that if n is large (n > 30), tn−1 is approximately normal.
Confidence interval
HA : µD > 0
or µ1 is less than µ2 ,
HA : µD < 0
or µ1 is different from µ2 ,
HA : µD 6= 0
Hypothesis testing
Step 4: compute d, sd2 , and SE (d). Obtain the value of the test
statistic.
Step 5: make a decision of whether or not to reject the null hypothesis.
Example 5.1
Example 5.1
Using a 10% level of significance, do the given sample data support the
claim that the bonus plan has a positive effect on the sales volume?
Let X 1 denote the sample mean from the first sample, and X 2 the
sample mean from the second sample, we know that X 1 is an unbiased
estimator of µ1 and X 2 is an unbiased estimator of µ2 .
X1 − X2 is an unbiased point estimator of µ1 − µ2 :
E X 1 − X 2 = µ1 − µ 2 .
Let s12 and s22 denote the sample variance of the first and second
sample, i.e., s12 = Var (x1 ) and s22 = Var (x2 ), they are unbiased
estimators of σ12 and σ22 .
The variance and standard error of X 1 − X 2 depends on whether the
two populations have equal or unequal variance.
If the two population variances are not equal, i.e., σ12 6= σ22 , then
Var X 1 − X 2 = Var (X 1 ) + Var (X 2 )
s12 s2
= + 2,
n1 n2
and s
s12 s2
+ 2.
SE X 1 − X 2 =
n1 n2
or
(X1 − X2 ) ± zα/2 × SE X 1 − X 2
Example 5.2
Example 5.2
H0 : σ12 = σ22
HA : σ12 6= σ22
Step 2: specify α.
Step 3: s12 and s22 are unbiased estimators for σ12 and σ22 . If X1 and X2
are both normally distributed, we use the ratio of sample variances as
the test statistic. Under the null that σ12 = σ22 , the test statistic has a
F distribution with n1 − 1 and n2 − 1 degrees of freedom,
s12
∼ F (n1 − 1, n2 − 1)
s22
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3
Step 4: compute the value of the test statistic from the sample
denoted by f .
Step 5: Make a decision (of whether or not to reject the null) using
the critical value approach (when the null distribution is asymmetric
and the test is two-sided, the p-value approach is more complicated
and hence omitted).
Find the critical values form lower and upper percentiles of
F (n1 − 1, n2 − 1)
In EXCEL: Fα/2 = F.INV (α/2,n1 − 1,n2 − 1) and F1−α/2 = F.INV
(1 − α/2,n1 − 1,n2 − 1)
Reject if f < Fα/2 or f > F1−α/2
HA : µ1 − µ2 > 0
or µ1 is less than µ2 ,
HA : µ1 − µ2 < 0
or µ1 is different from µ2 ,
HA : µ1 − µ2 6= 0
Step 3b: obtain the test statistic by standardizing the estimator under
the null:
X1 − X2
T =
SE X 1 − X 2
Step 3c: determine the null distribution for the test statistic.
If the population variables have unknown distribution but n1 and n2 are
large, the test statistic has a standard normal distribution.
If the population variables follow normal distributions, the test statistic
has a t distribution with degree of freedom determined by
Equal variance: df = n1 + n2 − 2
2 2
∗ s1 /n1 + s22 /n2
Unequal variance: df = 2
(s1 /n1 )2 (s22 /n2 )2
n1 −1 + n2 −1
Step 4: compute the value of the test statistic from the sample.
Step 5: make a decision using either the p-value approach or the
critical value approach.
H0 : µ D = 0
HA : µ D > 0
d
√ ∼ t19
sd / n
H0 : σ12 = σ22
HA : σ12 6= σ22
s12
∼ F (19, 19)
s22
Step 4: compute the value of the test statistic from the sample.
243.4
f = = 1.07
226.8
Step 5: In EXCEL:
F0.025 = F.INV (0.025,19,19)=0.396
F0.975 = F.INV (0.975,19,19)=2.526
Since the value from the sample lies within the two percentiles, we do
not reject the null of equal variance. In other words, we will assume
equal variance for testing the differences in mean.
H 0 : µ1 − µ 2 = 0
HA : µ 1 − µ 2 > 0
where
(n1 − 1)s12 + (n2 − 1)s22
sp2 =
n1 + n2 − 2
Dr Wei Wei (Monash University) ETF2121/5912 41 / 55
Differences in Population Means Independent Samples
and that
p1 (1 − p1 ) p2 (1 − p2 )
and
n1 n2
are unbiased estimators for Var (p1 ) and Var (p2 ).
The point estimator of the difference between population proportions
π1 − π2 is the difference between sample proportions p1 − p2 .
E (p1 − p2 ) = π1 − π2
p1 (1−p1 ) p2 (1−p2 )
and n1 + n2 is an unbiased estimator for Var (p1 − p2 )
Standard error
s
p1 (1 − p1 ) p2 (1 − p2 )
SE (p1 − p2 ) = + .
n1 n2
Sampling distribution
HA : π1 − π2 > 0
or π1 is less than π2 ,
H A : π1 − π2 < 0
or π1 is different from π2 ,
HA : π1 − π2 6= 0
Step 3b: obtain the test statistic by standardizing the estimator under
the null:
p1 − p2
Z=r
p(1 − p) n11 + n12
Example 5.4
Example 5.4
H0 : π1 − π2 = 0
HA : π1 − π2 > 0
Step 2: α = 0.05.
Example 5.4