0% found this document useful (0 votes)

124 views56 pages

ETF2121/ETF5912 Data Analysis in Business: Week 5: Estimation and Hypothesis Testing For Two Populations

The document discusses differences in population means between paired samples and independent samples. For paired samples, the key is to take the difference between the paired variables and analyze the differences as a single sample. The mean of the sample differences is an unbiased and consistent estimator of the difference in population means. With a large sample size, the sampling distribution of the mean difference will be approximately normal. A confidence interval can be constructed using the standard error of the mean difference.

Uploaded by

Thea liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views56 pages

ETF2121/ETF5912 Data Analysis in Business: Week 5: Estimation and Hypothesis Testing For Two Populations

Uploaded by

Thea liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

ETF2121/ETF5912 Data Analysis in Business

Week 5: Estimation and hypothesis testing for two populations

Dr Wei Wei

Monash University

Dr Wei Wei (Monash University) ETF2121/5912 1 / 55

1 Differences in Population Means
Paired Samples vs Independent Samples
Paired Samples
Estimation and sampling distribution
Testing for equal population mean
Independent Samples
Unequal population variance
Equal population variance
Testing for equal population variances
Testing for equal population mean

2 Differences in Population Proportions

Independent Samples with Large Sample Size

Dr Wei Wei (Monash University) ETF2121/5912 2 / 55

Differences in Population Means

Differences of Population Means

Let X1 and X2 denote the variables associated with the first and second
population, respectively.
The population means are µ1 and µ2 the population standard
deviations are σ1 and σ2 .
We are interested in whether µ1 is different from µ2 . In other words,
the parameter of interest is µ1 − µ2 .
We take random samples of size n1 and n2 from the two populations.

Dr Wei Wei (Monash University) ETF2121/5912 3 / 55

Differences in Population Means

Why two populations?

In general, the two populations differ in a key characteristic and we

want to examine if the difference in this characteristic is associated
with a difference in the mean of one target variable.
Examples:
air quality of a city with high household income versus the air quality of
a city with low household income;
the amount of electricity used by freezer brand A versus brand B;
the proportion of voters for a political party in younger versus older
population;
the sales volume before and after an advertising campaign.

Dr Wei Wei (Monash University) ETF2121/5912 4 / 55

Differences in Population Means Samples

Paired samples

Paired samples: when two samples are selected in such a way that
each item in one sample has a corresponding match or related item in
the other sample.
The sample size is common in both samples: n1 = n2 = n.
For example, we may be interested in the effect of a training program
on employee productivity. If we select the same set of employees and
measure their productivity before and after the training program, we
have paired samples.

Dr Wei Wei (Monash University) ETF2121/5912 5 / 55

Differences in Population Means Samples

Independent samples

Independent samples: samples that are completely independent of

one another, meaning that the sample members selected from one
population is not related to the sample members drawn from other
population.
The sample size from the two samples can be different, i.e., n1 may be
different from n2 .
For example, we want to know if the average household income is
higher in Caulfield or Clayton. We can take a sample of residents in
Caulfield and ask them about their income, and take another
independent sample of residents from Clayton and ask them about
their income.

Dr Wei Wei (Monash University) ETF2121/5912 6 / 55

Differences in Population Means Samples

Independent samples vs paired samples

Are the following samples independent or paired?

We want to know if the average daily revenue of restaurant A is the
same as the average daily revenue of restaurant B.
We obtain the daily revenue of both restaurants over the year of 2020.

Dr Wei Wei (Monash University) ETF2121/5912 7 / 55

Differences in Population Means Samples

Independent samples vs paired samples

Samples may be neither paired nor independent.

Suppose that we want to know if the average marks of ETF2121 is
higher/lower/the same as the average marks of ETF2100. Are the
following samples independent or paired or neither?
We select 25 students who attended both units and obtain their marks
from both units.
We randomly select 20 students from the cohort of ETF2121, obtain
their marks, then randomly select 20 students who are not in the
sample for ETF2121 from the cohort of ETF2100 and obtain their
marks.
We randomly select 200 students from the cohort of ETF2121, obtain
their marks, then randomly select 230 students from the cohort of
ETF2100 and obtain their marks.

Dr Wei Wei (Monash University) ETF2121/5912 8 / 55

Differences in Population Means Paired Samples

Transform paired samples into one sample

The key for analyzing paired samples is to take the difference between
the two variables (of the same unit) then proceed as in the one sample
case.
Let D = X1 − X2 denote the difference between the population
variables. We calculate the sample difference d = x1 − x2 for each
paired unit in the paired samples.
Unit X1 X2 D = X1 − X2
1 x11 x12 d1 = x11 − x12
2 x21 x22 d2 = x21 − x22
... ... ...
n xn1 xn2 dn = xn1 − xn2

Dr Wei Wei (Monash University) ETF2121/5912 9 / 55

Differences in Population Means Paired Samples

Properties of the estimator: unbiasedness

The parameter of interest is the difference in population mean

µ1 − µ2 ≡ E (X1 ) − E (X2 ). Since
E (X1 ) − E (X2 ) = E (X1 − X2 ) = E (D), the difference in
population mean is the same as the mean of the population
difference, i.e., µ1 − µ2 = µD where µD ≡ E (D).
The mean of sample difference, defined as
Pn
di
d = i=1 ,
n
is a point estimator of the mean of population difference, µD .
This estimator is unbiased, i.e.,

E (d) = µD .

Dr Wei Wei (Monash University) ETF2121/5912 10 / 55

Differences in Population Means Paired Samples

Properties of the estimator: consistency

Dr Wei Wei (Monash University) ETF2121/5912 11 / 55

Differences in Population Means Paired Samples

Let 2
denote the population variance of D. The sample variance of
σD
d, defined as

Pn 2
i=1 di − d
sd2 = Var (d) = ,
n−1

2.
is an unbiased estimator of the population variance, σD

The sample variance of d, defined as

Var (d) s2
SE (d)2 = Var (d) = = d,
n n

2
is an unbiased estimator of the estimator’s variance σD /n.

Since Var (d) approached zero as the sample size n increases, and
E (d) = µD , d is also a consistent estimator of µD .
Dr Wei Wei (Monash University) ETF2121/5912 12 / 55
Differences in Population Means Paired Samples

Properties of the estimator: sampling distribution

If we do not know the distribution of D but n is large (n > 30), the

sampling distribution of d is approximately normal:
s2
d ∼ N µD , d ,
n
or
d − µD
√ ∼ N(0, 1)
sd / n
If D can be assumed to be normal, the standardized d follows a
t-distribution:

d − µD
√ ∼ tn−1
sd / n
Note that if n is large (n > 30), tn−1 is approximately normal.

Dr Wei Wei (Monash University) ETF2121/5912 12 / 55

Differences in Population Means Paired Samples

Confidence interval

Given the sampling distribution of any estimator, you should be able

to derive the confidence interval of the parameter.
A 100(1 − α)% confidence interval estimator of the population mean
difference µD is
d ± zα/2 × SE (d)
or
d ± tα/2,n−1 × SE (d)

Dr Wei Wei (Monash University) ETF2121/5912 13 / 55

Differences in Population Means Paired Samples

Hypothesis testing for equal population mean

Step 1: formulate the hypotheses

the null:
H0 : µD = 0
This is equivalent to µ1 = µ2 .
the alternative is either µ1 is larger than µ2 ,

HA : µD > 0

or µ1 is less than µ2 ,
HA : µD < 0
or µ1 is different from µ2 ,

HA : µD 6= 0

Dr Wei Wei (Monash University) ETF2121/5912 14 / 55

Differences in Population Means Paired Samples

Hypothesis testing

Step 2: specifying a significance level. 1%, 5% and 10% are the

commonly used significance level as before.
Step 3: transform the two paired samples into one sample of
differences, d = x1 − x2 . Determine the sampling distribution of the
test statistic T = s /d√n . Is it tn−1 or N(0, 1)?
d

Step 4: compute d, sd2 , and SE (d). Obtain the value of the test
statistic.
Step 5: make a decision of whether or not to reject the null hypothesis.

Dr Wei Wei (Monash University) ETF2121/5912 15 / 55

Differences in Population Means Paired Samples

Example 5.1

A company attempts to evaluate the effect of a new bonus plan. HR

selected a random sample of 5 salespersons to use this bonus plan for
a trial period.
The weekly sales volume before and after implementing the bonus
plan are shown below:

Assume that the population differences in weekly sales is normally

distributed.
Using a 10% level of significance, do the given sample data support the
claim that the bonus plan has a positive effect on the sales volume?
Dr Wei Wei (Monash University) ETF2121/5912 16 / 55
Differences in Population Means Paired Samples

Example 5.1

Using a 10% level of significance, do the given sample data support the
claim that the bonus plan has a positive effect on the sales volume?

Dr Wei Wei (Monash University) ETF2121/5912 17 / 55

Differences in Population Means Independent Samples

Estimating the mean difference in independent samples

Let X 1 denote the sample mean from the first sample, and X 2 the
sample mean from the second sample, we know that X 1 is an unbiased
estimator of µ1 and X 2 is an unbiased estimator of µ2 .
X1 − X2 is an unbiased point estimator of µ1 − µ2 :

E X 1 − X 2 = µ1 − µ 2 .

Let s12 and s22 denote the sample variance of the first and second
sample, i.e., s12 = Var (x1 ) and s22 = Var (x2 ), they are unbiased
estimators of σ12 and σ22 .
The variance and standard error of X 1 − X 2 depends on whether the
two populations have equal or unequal variance.

Dr Wei Wei (Monash University) ETF2121/5912 18 / 55

Differences in Population Means Independent Samples

Unequal population variance: standard error

If the two population variances are not equal, i.e., σ12 6= σ22 , then

Var X 1 − X 2 = Var (X 1 ) + Var (X 2 )
s12 s2
= + 2,
n1 n2
and s
s12 s2
+ 2.

SE X 1 − X 2 =
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 19 / 55

Differences in Population Means Independent Samples

Unequal population variance: sampling distribution

Now that we know the mean and variance of X 1 − X 2 , we can

standardize it to

X1 − X2 − E X1 − X2 X1 − X2 − (µ1 − µ2 )
= q 2
SE X 1 − X 2 s1 s2
+ 2 n1 n2

If both n1 and n2 are large, the standardized estimator above follows

N(0, 1) regardless of what distribution X1 and X2 follows.
If the population variables X1 and X2 follow normal distributions and
the sample size is small, we use a t-distribution with
2 2
∗ s1 /n1 + s22 /n2
df = 2 2 2 .
(s1 /n1 ) (s22 /n2 )
n1 −1 + n2 −1

Fractional values of df ∗ are rounded down.

Dr Wei Wei (Monash University) ETF2121/5912 20 / 55
Differences in Population Means Independent Samples

Unequal population variance: confidence interval

A 100(1 − α)% confidence interval of µ1 − µ2 is

s
s12 s2
+ 2

X 1 − X 2 ± tα/2,df ∗ ×
n1 n2
or s
s12 s2
+ 2

X 1 − X 2 ± zα/2 ×
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 21 / 55

Differences in Population Means Independent Samples

Equal population variance

If the population variances are assumed to be equal, the common

variance is estimated by pooling the estimates of the standard
deviation of both the samples
Pn1 2 Pn2 2
2 i=1 (x1i − x 1 ) + i=1 (x2i − x 2 )
sp =
n1 + n2 − 2
(n1 − 1)s1 + (n2 − 1)s22
2
= .
n1 + n2 − 2

Then the variance and standard error of the estimator X 1 − X 2 are

2 1 1
Var X 1 − X 2 = sp + ,
n1 n2
r
1 1
SE X 1 − X 2 = sp + .
n1 n2
Dr Wei Wei (Monash University) ETF2121/5912 22 / 55
Differences in Population Means Independent Samples

Equal population variance: sampling distribution

The standardized estimator,

X 1 − X 2 − (µ1 − µ2 ) X 1 − X 2 − (µ1 − µ2 )
= q
SE X 1 − X 2 sp 1 + 1 n1 n2

follows either a normal distribution N(0, 1) if both n1 and n2 are large,

or a t-distribution with df = n1 + n2 − 2 if both X1 and X2 are
normally distributed.
A 100(1 − α)% Confidence interval estimator of µ1 − µ2 is

(X1 − X2 ) ± tα/2,df × SE X 1 − X 2

or
(X1 − X2 ) ± zα/2 × SE X 1 − X 2

Dr Wei Wei (Monash University) ETF2121/5912 23 / 55

Differences in Population Means Independent Samples

Example 5.2

An urban planning group is interested in estimating the difference

between mean household incomes for two suburbs.
Independent samples of households in the suburbs provided the
following results.

Assuming that the population variances are equal, obtain a 95%

confidence interval estimate of the difference in mean incomes
between the two suburbs.

Dr Wei Wei (Monash University) ETF2121/5912 24 / 55

Differences in Population Means Independent Samples

Example 5.2

Dr Wei Wei (Monash University) ETF2121/5912 25 / 55

Differences in Population Means Independent Samples

Testing for equal population variances

In practice, we do not know whether the population variances are

equal or not. We can first use hypothesis testing to determine that.
Step 1:

H0 : σ12 = σ22
HA : σ12 6= σ22

Step 2: specify α.
Step 3: s12 and s22 are unbiased estimators for σ12 and σ22 . If X1 and X2
are both normally distributed, we use the ratio of sample variances as
the test statistic. Under the null that σ12 = σ22 , the test statistic has a
F distribution with n1 − 1 and n2 − 1 degrees of freedom,
s12
∼ F (n1 − 1, n2 − 1)
s22

Dr Wei Wei (Monash University) ETF2121/5912 26 / 55

Differences in Population Means Independent Samples

Testing for equal population variances: F distribution

The F distribution has two degrees of freedom parameters, one for the
numerator and one for the denominator. The numerator degrees of
freedom is always quoted first.
The F distribution is a non-symmetric distribution that is skewed to
the right. Its values are all positive.
F(9,100)
2 F(100,9)
F(9,9)
F(100,100)

1.5

0.5

0
0 0.5 1 1.5 2 2.5 3

Dr Wei Wei (Monash University) ETF2121/5912 27 / 55

Differences in Population Means Independent Samples

Testing for equal population variances

Step 4: compute the value of the test statistic from the sample
denoted by f .
Step 5: Make a decision (of whether or not to reject the null) using
the critical value approach (when the null distribution is asymmetric
and the test is two-sided, the p-value approach is more complicated
and hence omitted).
Find the critical values form lower and upper percentiles of
F (n1 − 1, n2 − 1)
In EXCEL: Fα/2 = F.INV (α/2,n1 − 1,n2 − 1) and F1−α/2 = F.INV
(1 − α/2,n1 − 1,n2 − 1)
Reject if f < Fα/2 or f > F1−α/2

Dr Wei Wei (Monash University) ETF2121/5912 28 / 55

Differences in Population Means Independent Samples

Testing for equality of variance before testing for equality of

means

The sampling distribution of the difference between population means

(based on independent samples) depends on whether or not we can
assume equal population variances or not. Hence, we should first test
for equality of variance before testing for equality of means.
If we can not reject the null that σ12 = σ22 , we assume equal variance.
If we do reject the null, we assume unequal variances.

Dr Wei Wei (Monash University) ETF2121/5912 29 / 55

Differences in Population Means Independent Samples

Hypothesis testing for population mean

Step 1: formulate the hypotheses

the null:
H0 : µ1 − µ2 = 0
the alternative is either µ1 is larger than µ2 ,

HA : µ1 − µ2 > 0

or µ1 is less than µ2 ,
HA : µ1 − µ2 < 0
or µ1 is different from µ2 ,

HA : µ1 − µ2 6= 0

Step 2: specifying a significance level. 1%, 5% and 10% are the

commonly used significance level as before.

Dr Wei Wei (Monash University) ETF2121/5912 30 / 55

Differences in Population Means Independent Samples

Hypothesis testing for population mean

Step 3a: determine the formula for SE (X1 − X2 ) depending on

whether or not the variance is equal.
if the variances are assumed to be unequal,
s
s12 s2
+ 2.

SE X 1 − X 2 =
n1 n2

assuming equal variance,

(n1 − 1)s12 + (n2 − 1)s22

sp2 = ,
n1 + n2 − 2
r
1 1
SE X 1 − X 2 = sp + .
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 31 / 55

Differences in Population Means Independent Samples

Hypothesis testing for population mean

Step 3b: obtain the test statistic by standardizing the estimator under
the null:
X1 − X2
T =
SE X 1 − X 2
Step 3c: determine the null distribution for the test statistic.
If the population variables have unknown distribution but n1 and n2 are
large, the test statistic has a standard normal distribution.
If the population variables follow normal distributions, the test statistic
has a t distribution with degree of freedom determined by

Equal variance: df = n1 + n2 − 2
2 2
∗ s1 /n1 + s22 /n2
Unequal variance: df = 2
(s1 /n1 )2 (s22 /n2 )2
n1 −1 + n2 −1

Dr Wei Wei (Monash University) ETF2121/5912 32 / 55

Differences in Population Means Independent Samples

Hypothesis testing for population mean

Step 4: compute the value of the test statistic from the sample.
Step 5: make a decision using either the p-value approach or the
critical value approach.

Dr Wei Wei (Monash University) ETF2121/5912 33 / 55

Differences in Population Means Independent Samples

Example 5.3: paired vs independent samples

A tyre manufacturer designed a new tyre and wants to know if the

new design lasts on average longer than the existing tyre. We design
two experiments:
paired samples.
independent samples;

Dr Wei Wei (Monash University) ETF2121/5912 34 / 55

Differences in Population Means Independent Samples

Example 5.3: paired samples

On 20 cars, one of each type of tyre (new and existing) is installed on

the rear wheels. 20 drivers were told to drive in their usual way until
the tyres wore out.
The new-design and existing-design tyres were installed on the same
set of cars and driven by the same drivers — natural paring of
observations.
The data are stored in the W5_01.xlsx.
Do the new-design tyres last on average longer than the existing tyres?

Dr Wei Wei (Monash University) ETF2121/5912 35 / 55

Differences in Population Means Independent Samples

Example 5.3: paired samples

Let µ1 = mean distance to wear-out for the new-design tyres

Let µ2 = mean distance to wear-out for the existing tyres
Step 1: formulate hypotheses, Let µD = µ1 − µ2

H0 : µ D = 0
HA : µ D > 0

Step 2: set significance level to 5%, i.e., α = 0.05.

Step 3: determine test statistic and null distribution:

d
√ ∼ t19
sd / n

Dr Wei Wei (Monash University) ETF2121/5912 36 / 55

Differences in Population Means Independent Samples

Example 5.3: paired samples

Step 4: calculate the test statistic from sample:

4.55
t = 7.22
= 2.82
√
20

Step 5: if we are using the p-value approach; In EXCEL,

p =1-T.DIST(2.82,19,1)=0.005
Since p < 0.05, we reject the null hypothesis. There is evidence to
support the conclusion that the new-design tyres last on average
longer than the existing tyres.

Dr Wei Wei (Monash University) ETF2121/5912 37 / 55

Differences in Population Means Independent Samples

Example 5.3: independent samples

New-design tyres were installed on the rear (driving) wheels of 20 cars,

and existing design tyres were installed on the rear wheels of another
20 cars.
40 drivers were told to drive in their usual way until the tyres wore out.
The number of kilometers driven by each driver was recorded in
W5_01.xlsx.
Do the new-design tyres last on average longer than the existing tyres?

Dr Wei Wei (Monash University) ETF2121/5912 38 / 55

Differences in Population Means Independent Samples

Example 5.3: independent samples: test for equal variances

Before testing for equal mean, we should first test for equal variance.
Let σ12 = variance in the distance to wear-out for the new-design tyres
Let σ22 = variance in the distance to wear-out for the existing tyres
Step 1:

H0 : σ12 = σ22
HA : σ12 6= σ22

Step 2: set significance level to 5%, i.e., α = 0.05.

Step 3: determine test statistic and null distribution. The ratio of
sample variances has a F distribution with 19 and 19 degrees of
freedom under the null,

s12
∼ F (19, 19)
s22

Dr Wei Wei (Monash University) ETF2121/5912 39 / 55

Differences in Population Means Independent Samples

Example 5.3: independent samples: test for equal variances

Step 4: compute the value of the test statistic from the sample.
243.4
f = = 1.07
226.8
Step 5: In EXCEL:
F0.025 = F.INV (0.025,19,19)=0.396
F0.975 = F.INV (0.975,19,19)=2.526
Since the value from the sample lies within the two percentiles, we do
not reject the null of equal variance. In other words, we will assume
equal variance for testing the differences in mean.

Dr Wei Wei (Monash University) ETF2121/5912 40 / 55

Differences in Population Means Independent Samples

Example 5.3: independent samples: test for equal mean

Step 1: formulate hypotheses,

H 0 : µ1 − µ 2 = 0
HA : µ 1 − µ 2 > 0

Step 2: set significance level to 5%, i.e., α = 0.05.

Step 3: since we are assuming equal variances in mean, we will use the
test statistic for equal variances
(x 1 − x 2 )
t=r ∼ t(38),
2 1 1
sp n1 + n2

where
(n1 − 1)s12 + (n2 − 1)s22
sp2 =
n1 + n2 − 2
Dr Wei Wei (Monash University) ETF2121/5912 41 / 55
Differences in Population Means Independent Samples

Example 5.3: independent samples: test for equal mean

Step 4 Calculate the test statistic

(20 − 1) 243.41 + (20 − 1) 226.8

sp2 = = 235.1
20 + 20 − 2
s
2
1 1
SE = sp + = 4.849
20 20
4.4
t = = 0.91
4.849
Step 5: we use the upper percentile since the alternative hypothesis
states the the parameter of interest is greater than a given value.
From Excel, t0.95,38 = 1.686. Since t = 0.91 < 1.686, we DO NOT
reject the null hypothesis. There is no evidence to support the
conclusion that the new-design tyres last on average longer than the
existing tyres.
Dr Wei Wei (Monash University) ETF2121/5912 42 / 55
Differences in Population Means Independent Samples

Example 5.3: Paired vs independent samples: which is

better?

Using paired samples we had enough evidence to conclude that the

new design tyres last longer, but not under independent samples.
Why?
In this example, there are two sources of variation: (i) car drivers and
(ii) tyre brands.
Independent samples lead to more variability in our outcome as
different drivers (for the new and existing tyres) may drive different
ways.
In paired samples, the variation in drivers is eliminated– same drivers
and cars were used in both new and existing tyres samples.

Dr Wei Wei (Monash University) ETF2121/5912 43 / 55

Differences in Population Means Independent Samples

Example 5.3: Paired vs independent samples: which is

better?

Comparing the test statistic from paired and independent samples:

4.4
independent samples: t = = 0.91
4.85
4.55
paired samples: t = = 2.82
1.615
In this example, the numerators are similar, BUT the denominators are
quite different .
Will the paired samples always produce a more significant test statistic
than the independent samples experiment?
The answer is not necessarily.
It depends whether the variation due to drivers is large.

Dr Wei Wei (Monash University) ETF2121/5912 44 / 55

Differences in Population Means Independent Samples

Example 5.3: Paired vs independent samples: ignoring the

pairs

What if we wrongly assumed that the data in paired samples came

from independent samples?
We have

x̄1 = 73.6 x̄2 = 69.05

s1 = 15.58 s2 = 17.79
n1 = 20 n2 = 20

We do not reject the null of equal variance.

Dr Wei Wei (Monash University) ETF2121/5912 45 / 55

Differences in Population Means Independent Samples

Example 5.3: Paired vs independent samples: ignoring the

pairs

The value of the test statistic for equal mean is

(n1 − 1)s12 + (n2 − 1)s22

sp2 =
n1 + n2 − 2
(19)15.582 + (19)17.792
= = 279.61
38
(x 1 − x 2 ) − (µ1 − µ2 ) 73.6 − 69.05
t = r =q
1 1
= 0.86
2 1
sp n1 + n2 1 279.61 20 + 20

Since t = 0.86 < 1.686, we fail to reject H0 .

If we wrongly treat the paired samples as independent, we would fail
to reject the null hypothesis.

Dr Wei Wei (Monash University) ETF2121/5912 46 / 55

Differences in Population Proportions Sample Size

Estimation of Differences: Population Proportions

Let π1 and π2 represent the population proportions, p1 = x1 /n1 and

p2 = x2 /n2 the sample proportions.
We know that p1 and p2 are unbiased estimators of π1 and π2 , i.e.,

E (p1 ) = π1 and E (p2 ) = π2 ,

and that
p1 (1 − p1 ) p2 (1 − p2 )
and
n1 n2
are unbiased estimators for Var (p1 ) and Var (p2 ).
The point estimator of the difference between population proportions
π1 − π2 is the difference between sample proportions p1 − p2 .

Dr Wei Wei (Monash University) ETF2121/5912 47 / 55

Differences in Population Proportions Sample Size

Properties of the estimator

The sample difference in proportion is an unbiased estimator for the

population difference in proportion:

E (p1 − p2 ) = π1 − π2

The two samples are independent, then

Var (p1 − p2 ) = Var (p1 ) + Var (p2 )

p1 (1−p1 ) p2 (1−p2 )
and n1 + n2 is an unbiased estimator for Var (p1 − p2 )
Standard error
s
p1 (1 − p1 ) p2 (1 − p2 )
SE (p1 − p2 ) = + .
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 48 / 55

Differences in Population Proportions Sample Size

Sampling distribution

If both n1 and n2 are large, the estimator follows a normal distribution:

p1 − p2 ∼ N (π1 − π2 , Var (p1 − p2 ))

or
(p1 − p2 ) − (π1 − π2 )
Z= q ∼ N(0, 1)
p1 (1−p1 ) p2 (1−p2 )
n1 + n2

A 100(1 − α)% Confidence interval estimator of π1 − π2 is

s
p1 (1 − p1 ) p2 (1 − p2 )
(p1 − p2 ) ± zα/2 × +
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 49 / 55

Differences in Population Proportions Sample Size

Test for Difference in Population Proportions

Step 1: formulate the hypotheses

the null:
H 0 : π1 − π2 = 0
the alternative is either π1 is larger than π2 ,

HA : π1 − π2 > 0

or π1 is less than π2 ,
H A : π1 − π2 < 0
or π1 is different from π2 ,

HA : π1 − π2 6= 0

Step 2: specifying a significance level. 1%, 5% and 10% are the

commonly used significance level as before.

Dr Wei Wei (Monash University) ETF2121/5912 50 / 55

Differences in Population Proportions Sample Size

Test for Difference in Population Proportions

Step 3: determine the null distribution (sampling distribution under

the null) for the point estimator, p1 − p2
Step 3a: Under the null, π1 = π2 = π. In this case, E (p1 − p2 ) = 0,
and the variance of p1 − p2 becomes

1 1
Var (p1 − p2 ) = π(1 − π) + ,
n1 n2

where π can be estimated using the pooled proportion

x1 + x2 n1 p1 n2 p2
p= = + .
n1 + n2 n1 + n2 n1 + n2
Hence s
1 1
SE (p1 − p2 ) = p(1 − p) + .
n1 n2

Dr Wei Wei (Monash University) ETF2121/5912 51 / 55

Differences in Population Proportions Sample Size

Test for Difference in Population Proportions

Step 3b: obtain the test statistic by standardizing the estimator under
the null:
p1 − p2
Z=r
p(1 − p) n11 + n12

If n1 and n2 are large, the test statistic has a standard normal

distribution.
Step 4: compute the value of the test statistic from the sample.
Step 5: make a decision using either the p-value approach or the
critical value approach.

Dr Wei Wei (Monash University) ETF2121/5912 52 / 55

Differences in Population Proportions Sample Size

Example 5.4

In a public opinion survey, 65 out of a sample of 100 high-income

voters (Incomes of at least $100,000) and 35 out of a sample of 75
low-income voters (incomes less than $100,000) supported the
introduction of the flood levy which will go toward assisting residents
effected by flooding during extreme weather events.
Can we conclude at the 5% level of significance that a higher
proportion of high-income voters support the flood levy?

Dr Wei Wei (Monash University) ETF2121/5912 53 / 55

Differences in Population Proportions Sample Size

Example 5.4

Step 1: let π1 denote the proportion of supporters in high-income

voters, and π2 denote the proportion of supporters in low-income
voters,

H0 : π1 − π2 = 0
HA : π1 − π2 > 0

Step 2: α = 0.05.

Dr Wei Wei (Monash University) ETF2121/5912 54 / 55

Differences in Population Proportions Sample Size

Example 5.4

Dr Wei Wei (Monash University) ETF2121/5912 55 / 55

Two Sample Test PDF
100% (1)
Two Sample Test PDF
50 pages
CHAPTER 9 Estimation and Confidence Intervals
100% (1)
CHAPTER 9 Estimation and Confidence Intervals
45 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Learning Objectives: Two Sample T-Test
No ratings yet
Learning Objectives: Two Sample T-Test
12 pages
Business Statistics: A First Course: Edition
No ratings yet
Business Statistics: A First Course: Edition
79 pages
Chap11 Two Sample Test BBA 2K3
No ratings yet
Chap11 Two Sample Test BBA 2K3
45 pages
6. Hypothesis-Test_2 Samples (1) (1)
No ratings yet
6. Hypothesis-Test_2 Samples (1) (1)
19 pages
Lecture 7 (Two Sample Tests)
No ratings yet
Lecture 7 (Two Sample Tests)
28 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
Pendugaan - Populasi Jamak
No ratings yet
Pendugaan - Populasi Jamak
52 pages
Chap 11
No ratings yet
Chap 11
48 pages
Inbound 8609162511062510069
No ratings yet
Inbound 8609162511062510069
28 pages
Chapter 3
No ratings yet
Chapter 3
28 pages
Stats Report 2
No ratings yet
Stats Report 2
17 pages
stats 10
No ratings yet
stats 10
38 pages
ch10 Slides
No ratings yet
ch10 Slides
47 pages
Criminal Justice
No ratings yet
Criminal Justice
3 pages
Matched Pair+Hypothesis+Testing
No ratings yet
Matched Pair+Hypothesis+Testing
8 pages
Hypothesis Testing 2
No ratings yet
Hypothesis Testing 2
43 pages
Ken Black QA ch10
No ratings yet
Ken Black QA ch10
47 pages
Inferential Statistics 1
No ratings yet
Inferential Statistics 1
34 pages
Complete Business Statistics: The Comparison of Two Populations
No ratings yet
Complete Business Statistics: The Comparison of Two Populations
66 pages
Chap 09
No ratings yet
Chap 09
51 pages
Statistics For Business and Economics: Estimation: Additional Topics
No ratings yet
Statistics For Business and Economics: Estimation: Additional Topics
51 pages
Pertemuan8 Twosamples Hypothesistest PDF
No ratings yet
Pertemuan8 Twosamples Hypothesistest PDF
52 pages
Statistics For Business and Economics: Inferences Based On Two Samples: Confidence Intervals & Tests of Hypotheses
No ratings yet
Statistics For Business and Economics: Inferences Based On Two Samples: Confidence Intervals & Tests of Hypotheses
112 pages
Chap 10
No ratings yet
Chap 10
42 pages
Full fomulas of two-sample test
No ratings yet
Full fomulas of two-sample test
46 pages
Lecture 10
No ratings yet
Lecture 10
38 pages
Chapter 11
No ratings yet
Chapter 11
19 pages
Stat 115 - Chapter 4
No ratings yet
Stat 115 - Chapter 4
62 pages
Inference About Comparing Two Populations
No ratings yet
Inference About Comparing Two Populations
64 pages
Tutorials in Statistics - Chapter 4 New
No ratings yet
Tutorials in Statistics - Chapter 4 New
11 pages
Week 2 & 3 - Hypothesis Part 3 (WK 3) - 2
No ratings yet
Week 2 & 3 - Hypothesis Part 3 (WK 3) - 2
60 pages
Two-Sample Hypothesis Tests
No ratings yet
Two-Sample Hypothesis Tests
56 pages
CH06
No ratings yet
CH06
62 pages
Two-Sample Hypothesis Tests
No ratings yet
Two-Sample Hypothesis Tests
51 pages
Two Sample Test
No ratings yet
Two Sample Test
43 pages
Practice Quiz - Chap 10
No ratings yet
Practice Quiz - Chap 10
30 pages
MATH& 146 Lesson 30: Difference of Two Means
No ratings yet
MATH& 146 Lesson 30: Difference of Two Means
28 pages
MD115 Wk05
No ratings yet
MD115 Wk05
86 pages
Lecture 1: Course Introduction, Review and Paired-Samples T-Test
No ratings yet
Lecture 1: Course Introduction, Review and Paired-Samples T-Test
13 pages
Chapter 8 - Estimation and Hypothesis Testing For Two Population Parameters
100% (1)
Chapter 8 - Estimation and Hypothesis Testing For Two Population Parameters
11 pages
Testing Two Independent Samples - With Minitab Procedures)
No ratings yet
Testing Two Independent Samples - With Minitab Procedures)
67 pages
The Differences and Similarities Between Two-Sample T-Test and Paired T-Test
No ratings yet
The Differences and Similarities Between Two-Sample T-Test and Paired T-Test
5 pages
STAT 206 - Chapter 10 (Two-Sample Hypothesis Tests)
No ratings yet
STAT 206 - Chapter 10 (Two-Sample Hypothesis Tests)
38 pages
Chapter 10-Inference About Means and Proportions With Two Populations
No ratings yet
Chapter 10-Inference About Means and Proportions With Two Populations
69 pages
Chap11 Two Sample Hypothesis Testing BBA 2K3
No ratings yet
Chap11 Two Sample Hypothesis Testing BBA 2K3
47 pages
Theory Hypothesis Design Data: To Answer / To Test Research Study Collect
No ratings yet
Theory Hypothesis Design Data: To Answer / To Test Research Study Collect
44 pages
Stats
No ratings yet
Stats
58 pages
DOC-20250311-WA0002.
No ratings yet
DOC-20250311-WA0002.
41 pages
wk3LtwoSample24st (1)
No ratings yet
wk3LtwoSample24st (1)
138 pages
Sampling & Estimation
No ratings yet
Sampling & Estimation
19 pages
Statistics and Probability T-Test
No ratings yet
Statistics and Probability T-Test
37 pages
Reading-Point Estimates of Population Mean
No ratings yet
Reading-Point Estimates of Population Mean
5 pages
Chap 8
No ratings yet
Chap 8
46 pages
08 Learning About Mean Difference
No ratings yet
08 Learning About Mean Difference
12 pages
Chapter 10. Two-Sample Tests
No ratings yet
Chapter 10. Two-Sample Tests
51 pages
BSAFC4 - PPT - ch10 (Two Sample Test) v2 (1) - Compressed
No ratings yet
BSAFC4 - PPT - ch10 (Two Sample Test) v2 (1) - Compressed
96 pages
R Programming Codes Linear Regression
No ratings yet
R Programming Codes Linear Regression
20 pages
Violations of OLS
No ratings yet
Violations of OLS
64 pages
Scientific Method - Routledge
No ratings yet
Scientific Method - Routledge
4 pages
Statistics and Probability: Quarter 4 - Module 3: Test Statistic On Population Mean Week 3 To Week 4
100% (1)
Statistics and Probability: Quarter 4 - Module 3: Test Statistic On Population Mean Week 3 To Week 4
20 pages
2015 Cdi Final Coaching2 Answer Key
No ratings yet
2015 Cdi Final Coaching2 Answer Key
37 pages
Q1: Explain Basic and Applied Research?: Pure Research. It Is A Systematic Investigation To Get Better Understanding of A
No ratings yet
Q1: Explain Basic and Applied Research?: Pure Research. It Is A Systematic Investigation To Get Better Understanding of A
4 pages
REASERCH METHODOLOGY_Unit-I &II
No ratings yet
REASERCH METHODOLOGY_Unit-I &II
57 pages
Forecasting
No ratings yet
Forecasting
6 pages
Chapter 4
No ratings yet
Chapter 4
72 pages
Falsifications and Corroborations - Karl Popper's Influence On Systematics (Helfenbein & DeSalle 2005)
No ratings yet
Falsifications and Corroborations - Karl Popper's Influence On Systematics (Helfenbein & DeSalle 2005)
10 pages
P08 - 178380 - Eviews Guide
No ratings yet
P08 - 178380 - Eviews Guide
9 pages
Knowing The Truth
100% (1)
Knowing The Truth
50 pages
Statistics: Shaheena Bashir
No ratings yet
Statistics: Shaheena Bashir
36 pages
03 Melnikovas Onion Research Model
No ratings yet
03 Melnikovas Onion Research Model
16 pages
Ian Hacking The Emergence of Probability PDF
No ratings yet
Ian Hacking The Emergence of Probability PDF
5 pages
Session 2-3 (ANOVA) Regression
No ratings yet
Session 2-3 (ANOVA) Regression
54 pages
G Power Test 2
No ratings yet
G Power Test 2
12 pages
Inductive Vs Deductive Research
No ratings yet
Inductive Vs Deductive Research
29 pages
Research Methode and Reporting in Science Module-1
No ratings yet
Research Methode and Reporting in Science Module-1
123 pages
Kelola, 13 - JIMKES 2021 Vol 9 No 1 Yulia
No ratings yet
Kelola, 13 - JIMKES 2021 Vol 9 No 1 Yulia
4 pages
Sophia Critical Thinking Syllabus
No ratings yet
Sophia Critical Thinking Syllabus
3 pages
Unit 2. Sources For The Study of Tourism - Theory
No ratings yet
Unit 2. Sources For The Study of Tourism - Theory
23 pages
Statistics Chapter 9 Project
No ratings yet
Statistics Chapter 9 Project
3 pages
Linear Regression Model Slope: Ŷ B + B X + B X + B X + + B X
No ratings yet
Linear Regression Model Slope: Ŷ B + B X + B X + B X + + B X
9 pages
Lecture 3-MSDA 3055
No ratings yet
Lecture 3-MSDA 3055
44 pages
Philosophy of Education Prelim
No ratings yet
Philosophy of Education Prelim
31 pages
Abduction - The Logic of Guessing - Santaella, Lucia - Semiotica, #153
No ratings yet
Abduction - The Logic of Guessing - Santaella, Lucia - Semiotica, #153
24 pages
Analysis of Onion
No ratings yet
Analysis of Onion
9 pages
Cs607 3rd Quiz
No ratings yet
Cs607 3rd Quiz
14 pages
Beauty PDF
No ratings yet
Beauty PDF
10 pages