0% found this document useful (0 votes)
7 views

Session 13 14 - Hypothesis Testing-Two Sample Tests

Uploaded by

jeevanboda.738
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Session 13 14 - Hypothesis Testing-Two Sample Tests

Uploaded by

jeevanboda.738
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Hypothesis Testing: Two

Sample Tests
Sessions 13 & 14
BUSINESS STATISTICS
Comparing Two Populations
Previously we looked at techniques to estimate and test parameters for
one population:
Population Mean µ
Population Proportion p
Population variance σ2

We will still consider these parameters when we are looking at two


populations, however our interest will now be the difference between
two means.
Difference between Two Means
In order to test and estimate the difference between two population means,
we draw random samples from each of two populations. Initially, we will
consider independent samples, that is, samples that are completely unrelated
to one another.
Population 1

Sample, size: n1

Parameters: Statistics:

(Likewise, we consider for Population 2)


Two-Sample Tests

Two-Sample Tests

Population Population Population


Means, Means, Population
Variances
Independent Related Proportions
Samples Samples
Examples:
Group 1 vs. Same group Proportion 1 vs. Variance 1 vs.
Group 2 before vs. after Proportion 2 Variance 2
treatment
Difference Between Two Means

Population means, Goal: Test hypothesis or form


independent
samples
* a confidence interval for the
difference between two
population means, μ1 – μ2
σ1 and σ2 unknown,
assumed equal The point estimate for the
difference is

X1 – X 2
σ1 and σ2 unknown,
not assumed equal
Difference Between Two Means: Independent Samples

• Different data sources


Population means, • Unrelated
independent
samples
* • Independent
• Sample selected from one population
has no effect on the sample selected
from the other population

Use Sp to estimate unknown


σ1 and σ2 unknown,
σ. Use a Pooled-Variance t
assumed equal test.

σ1 and σ2 unknown, Use S1 and S2 to estimate


not assumed equal unknown σ1 and σ2. Use a
Separate-variance t test
Hypothesis Tests for
Two Population Means
Two Population Means, Independent Samples

Lower-tail test: Upper-tail test: Two-tail test:

H0: μ1  μ2 H0: μ1 ≤ μ2 H0: μ1 = μ2


H1: μ1 < μ2 H1: μ1 > μ2 H1: μ1 ≠ μ2
i.e., i.e., i.e.,
H0: μ1 – μ2  0 H0: μ1 – μ2 ≤ 0 H0: μ1 – μ2 = 0
H1: μ1 – μ2 < 0 H1: μ1 – μ2 > 0 H1: μ1 – μ2 ≠ 0
Hypothesis tests for μ1 – μ2

Two Population Means, Independent Samples


Lower-tail test: Upper-tail test: Two-tail test:
H0: μ1 – μ2  0 H0: μ1 – μ2 ≤ 0 H0: μ1 – μ2 = 0
H1: μ1 – μ2 < 0 H1: μ1 – μ2 > 0 H1: μ1 – μ2 ≠ 0

a a a/2 a/2

-ta ta -ta/2 ta/2


Reject H0 if tSTAT < -ta Reject H0 if tSTAT > ta Reject H0 if tSTAT < -ta/2
or tSTAT > ta/2
Difference between Two Means
Because we are comparing two population means, we use the statistic,

which is an unbiased and consistent estimator of µ1- µ2.


Sampling Distribution of

is normally distributed if the original populations are


1. I. normal –or–
2. II. approximately normal if the populations are non-normal and the
sample sizes are large (n1, n2 > 30)

2. The expected value of is µ1- µ2

3. The variance of is [Var (X1-X2) = Var (X1) + Var (X2)]

and the standard error is:


Making Inferences About μ1-μ2
Since is normally distributed if the original
populations are normal –or– approximately normal if the
populations are nonnormal and the sample sizes are large,
then:

is a standard normal (or approximately normal) random


variable.

We could use this to build the test statistic and the


confidence interval estimator for µ1 - µ2.
Making Inferences About μ1-μ2
…except that, in practice, the z statistic is rarely used since the population
variances are unknown.

??

Instead we use a t-statistic. We consider two cases for the unknown


population variances: when we believe they are equal and conversely when
they are not equal.

More about this later…


Test Statistic for μ1-μ2 (equal variances)

Calculate – the pooled variance estimator as…

…and use it here:

degrees of freedom
CI Estimator for μ1-μ2 (equal variances)

The confidence interval estimator for μ1-μ2 when the


population variances are equal is given by:

pooled variance estimator degrees of freedom


Test Statistic for μ1-μ2 (unequal variances)

The test statistic for μ1-μ2 when the population variances are
unequal is given by:

degrees of freedom

Likewise, the confidence interval estimator is:


Which test to use? THUMB RULE
Which test statistic do we use? Equal variance or unequal variance?
Whenever there is insufficient evidence that the variances are unequal,
it is preferable to perform the
equal variances t-test.
This is so, because for any two given samples:

The number of degrees of The number of degrees


freedom for the equal
variances case
≥ of freedom for the unequal
variances case

Larger numbers of degrees of ≥


freedom have the same effect as
having larger sample sizes
Statistically Appropriate Process is to compare the
Population Variances

Testing the Population Variances

H0: σ12 / σ22 = 1


H1: σ12 / σ22 ≠ 1

Test statistic: s12 / s22, which is F-distributed with degrees of freedom ν1 = n1–
1 and ν2 = n2 −2.

The required condition is the same as that for the t-test of


µ1 - µ2 , which is both populations are normally distributed.
F Distribution…
The F density function is given by:

F > 0. Two parameters define this distribution, and like we’ve already
seen these are again degrees of freedom.
is the “numerator” degrees of freedom and
is the “denominator” degrees of freedom.
F Distribution…
The mean and variance of an F random variable are given by:

and

The F distribution is similar to the distribution in that its starts at


zero (is non-negative) and is not symmetrical.
Determining Values of F…
For example, what is the value of F for 5% of the area under the right
hand “tail” of the curve, with a numerator degree of freedom of 3 and
a denominator degree of freedom of 7?
Solution:
/2 F
There are different tables
for different values of A.
0
Do not Reject H0
Make sure you start with reject H0 Fα/2
the correct table!!

F.05,3,7 0
Do not
reject H0 Fα
Reject H0 F
F.05,3,7=4.35
Denominator Degrees of Freedom : ROW
Numerator Degrees of Freedom : COLUMN
Determining Values of F…
For areas under the curve on the left hand side of the curve, we can
leverage the following relationship:

Pay close attention to the order of the terms!


Application of F Distribution in Test of Hypothesis
Example for F-stat
From a given data we calculated the following statistics. Sample size for both samples was 50

s12 = 37.49 and s22 = 43.34 Thus, Test statistic: F = 37.49/43.34 = 0.86

Rejection region: F  F / 2, 1 , 2  F.025, 49, 49  F.025,50,50  1.75

or

F  F1 / 2, 1 , 2  F.975, 49, 49  1 / F.025, 49, 49  1 / F.025,50,50  1 / 1.75  0.5714

Thus we con conclude that the assumption of equal variances of the two population
holds as we have insufficient evidence to reject H0.
Pooled-Variance t Test Example
You are a financial analyst for a brokerage firm. Is there a
difference in returns between MFs of 2 different sectors? You
collect the following data:
IT INFRA
Number 21 25
Sample mean 3.27 2.53
Sample std dev 1.30 1.16

Assuming both populations are


approximately normal with equal
variances, is
there a difference in mean
returns ( = 0.05)?
Pooled-Variance t Test Example: Calculating the Test
Statistic (continued)
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2)
H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)

The test statistic is:

t
X1  X 2   μ1  μ 2 

3.27  2.53  0  2.040
2  1 1 

 1
1.5021  
1 
Sp   
 n1 n 2   21 25 

S 
2 n1  1S1
2
 n 2  1S 2
2

21  11.30 2  25  11.16 2
 1.5021
p
(n1  1)  (n2  1) (21 - 1)  (25  1)
Pooled-Variance t Test Example: Hypothesis Test
Solution
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2) Reject H0 Reject H0

H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)


.025 .025
 = 0.05
df = 21 + 25 - 2 = 44
-2.0154 0 2.0154 t
Critical Values: t = ± 2.0154 2.040

Test Statistic: Decision:


3.27  2.53
t  2.040 Reject H0 at a = 0.05
 1 1 
Conclusion:
1.5021   
 21 25  There is evidence of a
difference in means.
Pooled-Variance t Test Example:
Confidence Interval for µ1 - µ2

Since we rejected H0 can we be 95% confident that µNYSE > µNASDAQ?

95% Confidence Interval for µNYSE - µNASDAQ

X  X   t
1 2 /2
2
p
1 1 
S     0.74  2.0154  0.3628  (0.009, 1.471)

 n1 n 2 

Since 0 is less than the entire interval, we can be 95% confident


that µNYSE > µNASDAQ
Separate-Variance t Test Example
You are a financial analyst for a brokerage firm. Is there a
difference in returns of MFs of 2 different sectors? You collect
the following data:
IT INFRA
Number 21 25
Sample mean 3.27 2.53
Sample std dev 1.30 1.16

Assuming both populations are


approximately normal with
unequal variances, is
there a difference in mean
return ( = 0.05)?
Separate-Variance t Test Example: Calculating the Test Statistic
(continued)
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2)
H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)
The test statistic is:

t
X 1 
 X 2  μ1  μ 2 

3.27  2.53  0  2.019
 S12 S 22   1.30 2 1.16 2 
     
 n1 n 2   21 25 
2 2
 S1 2 S 2 2   1.30 2
1.16 2

     
n n   21 25 Use degrees of
  2
1 2   
  40.57
 S1 
2
 S2 2 2
 1.30 2 2
 1.16  2 2 freedom = 40
       
n  n   21   25 
 1   2 
n1  1 n2  1 20 24
Separate-Variance t Test Example: Hypothesis Test
Solution
H0: μ1 - μ2 = 0 i.e. (μ1 = μ2) Reject H0 Reject H0

H1: μ1 - μ2 ≠ 0 i.e. (μ1 ≠ μ2)


.025 .025
 = 0.05
df = 40
-2.021 0 2.021 t
Critical Values: t = ± 2.021 2.019
Decision:
Test Statistic: Fail To Reject H0 at a
= 0.05
t  2.019 Conclusion:
There is insufficient
evidence in favor of a
difference in means.
F Test: An Example
You are a financial analyst for a brokerage firm. You want to
compare returns between the MFs of different sectors. You
collect the following data:
IT INFRA
Number 21 25
Mean 3.27 2.53
Std dev 1.30 1.16

Is there a difference in the variances


between the 2 MFs
at the  = 0.05 level?
F Test: Example Solution
• Form the hypothesis test:
H0: σ21 = σ22 (there is no difference between variances)
H1: σ21 ≠ σ22 (there is a difference between variances)

 Find the F critical value for  = 0.05:

 Numerator d.f. = n1 – 1 = 21 –1 =20

 Denominator d.f. = n2 – 1 = 25 –1 = 24
 Fα/2 = F.025, 20, 24 = 2.33
 F(1-α/2), 20, 24 = 1/ F.025, 24, 20 =1/2.41=0.415
F Test: Example Solution
(continued)

• The test statistic is: H 0 : σ1 2 = σ2 2


2 2 H 1 : σ1 2 ≠ σ2 2
S 1.30
FSTAT  1

2 2
 1.256
S2 1.16
/2 = .025

0 F
Do not Reject H0
reject H0
 FSTAT = 1.256 is not in the rejection F0.025=2.33
region, so we do not reject H0

 Conclusion: There is not sufficient evidence


of a difference in variances at  = .05
Paired Test And test of
Proportions for two Samples
IInd Part
Related Populations
The Paired Difference Test
Tests Means of 2 Related Populations
Related

Paired or matched samples
samples • Repeated measures (before/after)
• Use difference between paired values:

Di = X1i - X2i
• Eliminates Variation Among Subjects
• Assumptions:
• Both Populations Are Normally Distributed
• Or, if not Normal, use large samples
Related Populations
The Paired Difference Test
(continued)
The ith paired difference is Di , where
Related Di = X1i - X2i
samples
n
The point estimate for the
paired difference
D i
D i 1
population mean μD is D : n
n
The sample standard  i
(D  D ) 2

deviation is SD SD  i1
n 1
n is the number of pairs in the paired sample
The Paired Difference Test:
Finding tSTAT

• The test statistic for μD is:


Paired
samples
D  μD
t STAT 
SD
n

 Where tSTAT has n - 1 d.f.


The Paired Difference Test:
Possible Hypotheses
Paired Samples

Lower-tail test: Upper-tail test: Two-tail test:

H0: μD  0 H0: μD ≤ 0 H0: μD = 0


H1: μD < 0 H1: μD > 0 H1: μD ≠ 0

a a a/2 a/2

-ta ta -ta/2 ta/2


Reject H0 if tSTAT < -ta Reject H0 if tSTAT > ta Reject H0 if tSTAT < -ta/2
or tSTAT > ta/2
Where tSTAT has n - 1 d.f.
The Paired Difference Confidence Interval

Paired The confidence interval for μD is


samples
SD
D  t / 2
n
n

 (D  D)
i
2

where SD  i1
n 1
Paired Difference Test:
Example
• Assume you send your salespeople to a “customer service” training
workshop. Has the training made a difference in the number of complaints?
Or, put it simply: whether the Training Module was effective?

Number of Complaints: (2) - (1)  Di


Serv person Before (1) After (2) Difference, Di D = n
C.B. 6 4 - 2 = -4.2
T.F. 20 6 -14
M.H. 3 2 - 1
R.K. 0 0 0
SD 
 i
(D  D ) 2

M.O. 4 0 - 4 n 1
-21
 5.67
Paired Difference Test:
Solution
• Has the training made a difference in the number of complaints (at
the 0.01 level)?
Reject Reject
H0: μD = 0
H1: μD  0 /2
/2
 = .01 D = - 4.2 - 4.604 4.604
- 1.66
t0.005 = ± 4.604
d.f. = n - 1 = 4
Decision: Do not reject H0
(tstat is not in the reject region)
Test Statistic:
Conclusion: There is not a
D  μ D  4.2  0
t STAT    1.66 significant change in the
SD / n 5.67/ 5 number of complaints.
Two Population Proportions
Goal: test a hypothesis or form a
Population confidence interval for the difference
proportions between two population proportions,
π1 – π2
Assumptions:
n1 π1  5 , n1(1- π1)  5
n2 π2  5 , n2(1- π2)  5

The point estimate for


the difference is
p1  p2
Test Statistic for p1–p2
There are two cases to consider…
Two Population Proportions

In the null hypothesis we assume the


null hypothesis is true, so we assume π1
Population
proportions = π2 and pool the two sample estimates
The pooled estimate for the
overall proportion is:

X1  X 2
p
n1  n2
where X1 and X2 are the number of items of
interest in samples 1 and 2
Separate variance of proportions
Two Population Proportions
(continued)

The test statistic for


Population π1 – π2 is a Z statistic:
proportions

ZSTAT 
 p1  p 2    π1  π 2 
1 1 
p (1  p)   
 n1 n 2 

X1  X 2 X X
where p , p1  1 , p 2  2
n1  n2 n1 n2
Hypothesis Tests for
Two Population Proportions
Population proportions

Lower-tail test: Upper-tail test: Two-tail test:

H0: π1  π2 H0: π1 ≤ π2 H0: π1 = π2


H1: π1 < π2 H1: π1 > π2 H1: π1 ≠ π2
i.e., i.e., i.e.,
H0: π1 – π2  0 H0: π1 – π2 ≤ 0 H0: π1 – π2 = 0
H1: π1 – π2 < 0 H1: π1 – π2 > 0 H1: π1 – π2 ≠ 0
Hypothesis Tests for
Two Population Proportions
(continued)
Population proportions
Lower-tail test: Upper-tail test: Two-tail test:
H0: π1 – π2  0 H0: π1 – π2 ≤ 0 H0: π1 – π2 = 0
H1: π1 – π2 < 0 H1: π1 – π2 > 0 H1: π1 – π2 ≠ 0

a a a/2 a/2

-za za -za/2 za/2


Reject H0 if ZSTAT < -Za Reject H0 if ZSTAT > Za Reject H0 if ZSTAT < -Za/2
or ZSTAT > Za/2
Hypothesis Test Example:
Two population Proportions
Is there a significant difference between the
proportion of men and the proportion of women
who will vote Yes on Proposition A?

• In a random sample, 36 of 72 men and 35 of 50


women indicated they would vote Yes

• Test at the .05 level of significance


Hypothesis Test Example:
Two population Proportions
(continued)

• The hypothesis test is:


H0: π1 – π2 = 0 (the two proportions are equal)
H1: π1 – π2 ≠ 0 (there is a significant difference between proportions)
 The sample proportions are:
 Men: p1 = 36/72 = 0.50
 Women: p2 = 35/50 = 0.70
 The pooled estimate for the overall proportion is:
X1  X 2 36  35 71
p    .582
n1  n 2 72  50 122
Hypothesis Test Example: Two population Proportions
(continued)
Reject H0 Reject H0

The test statistic for π1 – π2 is:


.025 .025

z STAT 
 p1  p 2     1   2 
 1 1 
p (1  p )    -1.96 1.96
 n1 n 2  -2.20


 .50  .70   0   2.20
 1 1  Decision: Reject H0
.582 (1  .582)   
 72 50 
Conclusion: There is a
significant evidence of a
Critical Values = ±1.96
difference in proportions
(@ what df ??) who will vote yes between
For  = .05 men and women.
Confidence Interval for Two Population Proportions:
why only z- test [& no t-test]

Population The confidence interval for


proportions
π1 – π2 is:

p1 (1  p1 ) p 2 (1  p 2 )
 p1  p 2   Z/2 
n1 n2

You might also like