0% found this document useful (0 votes)
65 views11 pages

Chapter10 - Statistical Inference For 2 Samples

This document discusses statistical inference for comparing two normal distributions when their variances are known or unknown. It covers point estimation and hypothesis testing of the difference in means. Examples are provided to illustrate confidence intervals and hypothesis tests on differences in means.

Uploaded by

hallulel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views11 pages

Chapter10 - Statistical Inference For 2 Samples

This document discusses statistical inference for comparing two normal distributions when their variances are known or unknown. It covers point estimation and hypothesis testing of the difference in means. Examples are provided to illustrate confidence intervals and hypothesis tests on differences in means.

Uploaded by

hallulel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Contents

CHAPTER 10.
STATISTICAL INFERENCE 1. Inference on the Difference in Means of Two Normal
Distributions, Variances Known

FOR TWO SAMPLES 2. Inference on the Difference in Means of Two Normal


Distributions, Variances Unknown

[email protected] N
3. Inference on Two Population Proportions

Assumptions for Two-Sample Inference,


INFERENCE ON THE DIFFERENCE Variances Known
(1) 𝑋11 , 𝑋12 , … , 𝑋1𝑛1 is a random sample from population 1.
IN MEANS OF TWO NORMAL (2) 𝑋21 , 𝑋22 , … , 𝑋2𝑛2 is a random sample from population 2.
(3) The two populations represented by 𝑋1 and 𝑋2 are independent.
DISTRIBUTIONS, VARIANCES (4) Both populations are normal.

KNOWN

1
Confidence Interval on the Difference in
A point estimator of 𝝁𝟏 − 𝝁𝟐 Means, Variances Known
▪ A logical point estimator of 𝜇1 − 𝜇2 is the difference in sample means 𝑋ത1 − 𝑋ത2 .
𝐸 𝑋ത1 − 𝑋ത2 = 𝐸 𝑋ത1 − 𝐸 𝑋ത2 = 𝜇1 − 𝜇2
𝜎12 𝜎22
𝑉 𝑋ത1 − 𝑋ത2 = 𝑉 𝑋ത1 + 𝑉 𝑋ത2 = +
𝑛1 𝑛2
▪ Quantity

has a N(0, 1) distribution if the 2 populations are normal or is approximately a


N(0, 1) if the conditions of the central limit theorem apply.

One-Sided Confidence Bounds Example


A 100 1 − 𝛼 % one-sided confidence bounds on 𝜇1 − 𝜇2 may also be obtain. 2 samples concerning retention rates for first year students at private and public
institutions were obtained from the Department of Education’s data base to see if
there was a significant difference in the 2 types of colleges.
Private colleges Public universities
𝑛1 = 71 𝑛2 = 32
𝑥1ҧ = 78.17 𝑥ҧ2 = 84
𝜎1 = 91.17 𝜎2 = 97.64
What does a 95% confidence interval tell us about retention rates?
Answer. −5.83 ± 4.08

2
Tests on the Difference in Means,
Variances Known Example
Considering A product developer is interested in reducing the drying time of a primer paint.
hypothesis Two formulations of the paint are tested; formulation 1 is the standard chemistry,
testing on and formulation 2 has a new drying ingredient that should reduce the drying time.
the From experience, it is known that the standard deviation of drying time is 8
difference in minutes, and this inherent variability should be unaffected by the addition of the
new ingredient. Ten specimens are painted with formulation 1, and another 10
the means
specimens are painted with formulation 2; the 20 specimens are painted in random
𝜇1 − 𝜇2 of 2 order. The two sample average drying times are 𝑥1ҧ = 121 minutes and 𝑥ҧ2 =
normal 112 minutes, respectively. What conclusions can the product developer draw
populations. about the effectiveness of the new ingredient, using α = 0.05?

Example (cont.) Exercises


𝐻0 : 𝜇1 − 𝜇2 = 0 vs 𝐻1 : 𝜇1 − 𝜇2 > 0 1. Consider the hypothesis test of 2 normal populations: 𝐻0 : 𝜇1 = 𝜇2 vs 𝐻1 : 𝜇1 ≠
𝑥ҧ 1 −𝑥ҧ 2 −0
Test statistic: 𝑧0 = = 2.52 𝜇2 with known variances 𝜎1 = 10 and 𝜎2 = 5. Suppose that sample sizes 𝑛1 =
𝜎2 2
1 + 𝜎2 10 and 𝑛2 = 15 and that 𝑥ҧ1 = 4.7 and 𝑥ҧ2 = 7.8. Use 𝛼 = 0.05.
𝑛1 𝑛2
a. Test the hypothesis and find the P-value.
Reject 𝐻0 if 𝑧0 > 𝑧0.05 = 1.64
Conclusion: Reject 𝐻0 at 𝛼 = 0.05 b. Explain how the test could be conducted with a confidence interval.
P-value = 1 − Φ 2.52 = 0.0059

3
Exercises Exercises
3. A polymer is manufactured in a batch chemical process. Viscosity measurements are
2. Two machines are used for filling plastic bottles with a net volume of 16.0
normally made on each batch, and long experience with the process has indicated that the
ounces. The fill volume can be assumed to be normal with standard deviation
variability in the process is fairly stable with σ = 20. Fifteen batch viscosity measurements
𝜎1 = 0.020 and 𝜎2 = 0.025 ounces. A member of the quality engineering staff
suspects that both machines fill to the same mean net volume, whether or not this are given as follows: 724, 718, 776, 760, 745, 759, 795, 756, 742, 740, 761, 749, 739, 747,
volume is 16.0 ounces. A random sample of 10 bottles is taken from the output of 742. A process change that involves switching the type of catalyst used in the process is
each machine. made. Following the process change, eight batch viscosity measurements are taken: 735,
775, 729, 755, 783, 760, 738, 780. Assume that process variability is unaffected by the
Machine 1: 16.03, 16.04, 16.05, 16.05, 16.02, 16.01, 15.96, 15.98, 16.02, 15.99 catalyst change. If the difference in mean batch viscosity is 10 or less, the manufacturer
Machine 2: 16.02, 15.97, 15.96, 16.01, 15.99, 16.03, 16.04, 16.02, 16.01, 16.00 would like to detect it with a high probability.
a. Do you think the engineer is correct? Use α = 0.05. What is the P-value for this a. Formulate and test an appropriate hypothesis using α = 0.10. What are your conclusions?
test? Find the P-value.
b. Calculate a 95% confidence interval on the difference in means. b. Find a 90% confidence interval on the difference in mean batch viscosity resulting from
the process change

Two-Samples Inference, Variances


INFERENCE ON THE DIFFERENCE Unknown, Large Samples
▪ The assumptions of normal population distributions and known values of 𝜎12
IN MEANS OF TWO NORMAL and 𝜎22 are fortunately unnecessary when both sample sizes are sufficiently
large.
DISTRIBUTIONS, VARIANCES ▪ If the sample sizes 𝒏𝟏 and 𝒏𝟐 exceed 40, the quantity
𝑋ത1 − 𝑋ത2 − 𝜇1 − 𝜇2
UNKNOWN 𝑍=
𝑠12 𝑠22
+
𝑛1 𝑛2
has approximately a N(0, 1) distribution. (since the CLT)

4
Confidence Interval, Variances Unknown Confidence Interval, Variances Unknown
𝝈𝟐𝟏 = 𝝈𝟐𝟐 = 𝝈𝟐 , Small Samples 𝝈𝟐𝟏 = 𝝈𝟐𝟐 = 𝝈𝟐 , Small Samples
Suppose that we have two independent normal populations with unknown means
𝜇1 and 𝜇2 , and unknown but equal variances.
The pooled estimator of 𝜎 2 , denoted by 𝑆𝑝2 is defined by
𝐧𝟏 − 𝟏 𝐬𝟏𝟐 + 𝐧𝟐 − 𝟏 𝐬𝟐𝟐
𝐬𝐩𝟐 =
𝐧𝟏 + 𝐧𝟐 − 𝟐
The statistic
ഥ𝟏 − 𝑿
𝑿 ഥ 𝟐 − 𝝁𝟏 − 𝝁𝟐
𝑻=
𝟏 𝟏
𝑺𝒑 +
𝒏𝟏 𝒏𝟐
is the t distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom.

Confidence Interval, Variances Unknown Confidence Interval, Variances Unknown


𝝈𝟐𝟏 = 𝝈𝟐𝟐 = 𝝈𝟐 , Small Samples 𝝈𝟐𝟏 ≠ 𝝈𝟐𝟐 , Small Samples
Example. An article in the journal Hazardous Waste and Hazardous Materials ▪ The statistic
ഥ𝟏 − 𝑿
𝑿 ഥ 𝟐 − 𝝁𝟏 − 𝝁𝟐
(1989, Vol. 6) reported the results of an analysis of the weight of calcium in 𝑻=
standard cement and cement doped with lead. Reduced levels of calcium would 𝑺𝟐𝟏 𝑺𝟐𝟐
indicate that the hydration mechanism in the cement is blocked and would allow +
𝒏𝟏 𝒏𝟐
water to attack various locations in the cement structure. Ten samples of standard
cement had an average weight percent calcium of 𝑥1ҧ = 90.0 with a sample is distributed approximately as t with degrees of freedom given by
2
standard deviation of 𝑠1 = 5.0, and 15 samples of the lead-doped cement had an 𝑠12 𝑠22
average weight percent calcium of 𝑥1ҧ = 87.0 with a sample standard deviation of +
𝑛1 𝑛2
𝑠2 = 4.0. We assume that weight percent calcium is normally distributed and find 𝑑𝑓 = 2
a 95% confidence interval on the difference in means, 𝜇1 − 𝜇2 , for the two types 𝑠1 /𝑛1 2 𝑠 2 /𝑛 2
+ 2 2
of cement. Furthermore, we assume that both normal populations have the same 𝑛1 − 1 𝑛2 − 1
standard deviation. ▪ If 𝑑𝑓 is not an integer, round down to the nearest integer.

5
Confidence Interval, Variances Unknown Confidence Interval, Variances Unknown
𝝈𝟐𝟏 ≠ 𝝈𝟐𝟐 , Small Samples 𝝈𝟐𝟏 ≠ 𝝈𝟐𝟐 , Small Samples
Example. Arsenic concentration in
public drinking water supplies is a
potential health risk. An article
reported drinking water arsenic
concentrations in parts per billion
(ppb) for Phoenix and Arizona.
Construct 95% confidence interval for
difference in means. We assume that
both normal populations have not the
same standard deviation.

t test for 𝝁𝟏 − 𝝁𝟐 , Variances Unknown t test for 𝝁𝟏 − 𝝁𝟐 , Variances Unknown


𝝈𝟐𝟏 = 𝝈𝟐𝟐 , Small Samples 𝝈𝟐𝟏 = 𝝈𝟐𝟐 , Small Samples
Example. A manufacturer claims that the calling range (in miles) of its 900 MHz
cordless phone is greater than that of its leading competitor. You perform a study
using 14 phones from the manufacturer and 16 similar phones from its competitor.
Manufacturer Competition
𝑥1ҧ = 1275 𝑥ҧ2 = 1250
𝑠1 = 45 𝑠2 = 30
𝑛1 = 14 𝑛2 = 16

At 𝛼 = 0.05, is there enough evidence to support the manufacturer’s claim?


Assume that the populations are normally distributed and the population variances
are equal.

6
t test for 𝝁𝟏 − 𝝁𝟐 , Variances Unknown t test for 𝝁𝟏 − 𝝁𝟐 , Variances Unknown
𝝈𝟐𝟏 ≠ 𝝈𝟐𝟐 , Small Samples 𝝈𝟐𝟏 ≠ 𝝈𝟐𝟐 , Small Samples
• Step 1: Construct the two hypotheses H0: 𝜇1 −𝜇2 = ∆0 vs H1: 𝜇1 − 𝜇2 ≠ ∆0 Example. Consumer Reports tested several types of snow tires to determine how
𝒙
ഥ𝟏 −ഥ
𝒙𝟐 −∆𝟎 well each performed under winter conditions. When travelling on ice at 15 mph,
• Step 2: Find the test statistic: 𝒕𝟎 = 10 Firestone Winterfire tires had a mean stopping distance of 51 feet with a
𝒔𝟐 𝟐
𝟏 + 𝒔𝟐
𝒏𝟏 𝒏𝟐 standard deviation of 8 feet. The mean stopping distance for 12 Michelin XM+S
Alpine tires was 55 feet with a standard deviation of 3 feet. Can you conclude that
𝒔𝟐 𝟐 𝟐
𝟏 + 𝒔𝟐
there is a difference between the stopping distances of the two types of tires? Use
𝒏𝟏 𝒏𝟐 𝛼 = 0.01. Assume the populations are normally distributed and the population
• Step 3: Identify acceptance region, use t-distribution with 𝒅𝒇 = 𝟐 𝟐 variances are NOT equal.
𝒔𝟐
𝟏 /𝒏𝟏 𝒔𝟐/𝒏𝟐
+ 𝟐
𝒏𝟏−𝟏 𝒏𝟐 −𝟏

• Step 4: Make a decision:


If the test statistic is in critical region, then reject H0
If the test statistic is in acceptance region, then fail to reject H0

Exercises Exercises
1. Consider the hypothesis test 𝐻0 : 𝜇1 = 𝜇2 against 𝐻1 : 𝜇1 ≠ 𝜇2 . Suppose that 2. Two catalysts may be used in a batch chemical process. Twelve batches were
sample sizes are 𝑛1 = 15 and 𝑛2 = 15, that 𝑥1ҧ = 4.7 and 𝑥2ҧ = 7.8, and that 𝑠12 = prepared using catalyst 1, resulting in an average yield of 86 and a sample
4 and 𝑠22 = 6.25. Assume that 𝜎12 = 𝜎22 and that the data are drawn from normal standard deviation of 3. Fifteen batches were prepared using catalyst 2, and they
distributions. Use α = 0.05. resulted in an average yield of 89 with a standard deviation of 2. Assume that
yield measurements are approximately normally distributed with the same
a. Test the hypothesis.
standard deviation.
b. Explain how the test could be conducted with a confidence interval.
a. Is there evidence to support a claim that catalyst 2 produces a higher mean yield
than catalyst 1? Use α = 0.01.
b. Find a 99% confidence interval on the difference in mean yields that can be
used to test the claim in part (a).

7
Exercises Exercises
3. Two suppliers manufacture a plastic gear used in a laser printer. The impact strength of 4. An article in IEEE International Symposium on Electromagnetic Compatibility [“EM
these gears measured in foot-pounds is an important characteristic. A random sample of 10 Effects of Different Mobile Handsets on Rats’ Brain” (2002, Vol. 2, pp. 667–670)] quantified
gears from supplier 1 results in 𝑥ҧ1 = 290 and 𝑠1 = 12, and another random sample of 16 the absorption of electromagnetic energy and the resulting thermal effect from cellular
gears from the second supplier results in 𝑥ҧ2 = 321 and 𝑠2 = 22. phones. The experimental results were obtained from in vivo experiments conducted on rats.
The arterial blood pressure values (mmHg) for the control group (8 rats) during the
a. Is there evidence to support the claim that supplier 2 provides gears with higher mean experiment are 𝑥ҧ1 = 90 and 𝑠1 = 5, and for the test group (9 rats) are 𝑥ҧ2 = 115 and 𝑠2 =
impact strength? Use α = 0.05, and assume that both populations are normally distributed but 10.
the variances are not equal. a. Is there evidence to support the claim that the test group has higher mean blood pressure?
b. Do the data support the claim that the mean impact strength of gears from supplier 2 is at Use α = 0.05, and assume that both populations are normally distributed but the variances
least 25 foot-pounds higher than that of supplier 1? Make the same assumptions as in part are not equal.
(a). b. Calculate a confidence interval to answer the question in part (a).
c. Construct a confidence interval estimate for the difference in mean impact strength, and c. Do the data support the claim that the mean blood pressure from the test group is at least
explain how this interval could be used to answer the question posed regarding supplier-to- 15 mmHg higher than the control group? Make the same assumptions as in part (a).
supplier differences.

Introduction
We now consider the case with two binomial parameters of interest, say, 𝑝1 and
𝑝2 , and we wish to draw inferences about these proportions. We present large-

INFERENCE ON TWO sample hypothesis testing and confidence interval procedures based on the normal
approximation to the binomial.

POPULATION PROPORTIONS

8
Approximate Confidence Interval on the
Assumptions Difference in Population Proportions
Suppose that two independent random samples of sizes 𝑛1 and 𝑛2 are taken from
two populations, and let 𝑋1 and 𝑋2 represent the number of observations that
belong to the class of interest in samples 1 and 2, respectively. Furthermore,
suppose that the normal approximation to the binomial is applied to each
𝑋 𝑋
population, so the estimators of the population proportions 𝑃1 = 1 and 𝑃2 = 2
𝑛1 𝑛2
have approximate normal distributions (𝑛𝑖 𝑃𝑖 > 5 and 𝑛𝑖 1 − 𝑝𝑖 > 5)

Large-Sample Tests on the Difference in


Example Population Proportions
A researcher found that 12 out of 34 inner-city nursing homes had a flue • Step 1: Construct the two hypotheses 𝑥 +𝑥
vaccination rate of less than 80%, while 17 out of 24 countryside nursing homes Remark. 𝑝Ƹ = 1 2
H0: 𝑝1 −𝑝2 = 0 𝑛1 +𝑛2
had a flue vaccination rate of less than 80%. Find the 95% confidence interval for
H1: 𝑝1 − 𝑝2 ≠ 0
the difference of the proportions.
• Step 2: Find the test statistic:
(𝑝Ƹ1 − 𝑝Ƹ2 )
𝑧0 =
𝑝(1
Ƹ − 𝑝)Ƹ 𝑝(1
Ƹ − 𝑝)Ƹ
𝑛 + 𝑛
1 2
• Step 3: Identify acceptance region, use Z = N(0,1).
• Step 4: Make a decision:
If the test statistic is in critical region, then reject H0
If the test statistic is in acceptance region, then fail to reject H0

9
Example Exercises
We would like to compare the death rates from liver transplants at 2 hospitals in Consider the following computer output.
similar areas. a. Is this a one-sided or a two-sided test?
Hospital A: 77/100 died within 6 months b. Fill in the missing values.
Hospital B: 120/200 died within 6 months c. Can the null hypothesis be rejected?
d. Construct an approximate 90% CI for
Are the death rates for the 2 hospitals statistically different? Test at 𝛼 = 0.05
the difference in the two proportions

Exercises Exercises
An article in Knee Surgery, Sports Traumatology, Arthroscopy (2005, Vol. 13(4), A random sample of 500 adult residents of Maricopa County indicated that 385
pp. 273–279) considered arthroscopic meniscal repair with an absorbable screw. were in favor of increasing the highway speed limit to 75 mph, and another
Results showed that for tears greater than 25 millimeters, 14 of 18 (78%) repairs sample of 400 adult residents of Pima County indicated that 267 were in favor of
were successful, but for shorter tears, 22 of 30 (73%) repairs were successful. the increased speed limit.
Is there evidence that the success rate is greater for longer tears? Use α = 0.05. a. Do these data indicate that there is a difference in the support for increasing the
What is the P-value? speed limit for the residents of the two counties? Use α = 0.05. What is the P-
value for this test?
b. Construct a 95% confidence interval on the difference in the two proportions.
Provide a practical interpretation of this interval.

10
Exercises
Two different types of injection-molding machines are used to form plastic parts.
A part is considered defective if it has excessive shrinkage or is discolored. Two
random samples, each of size 300, are selected, and 15 defective parts are found
in the sample from machine 1, and 8 defective parts are found in the sample from
machine 2.
a. Is it reasonable to conclude that both machines produce the same fraction of
defective parts, using α = 0.05? Find the P-value for this test.
b. Construct a 95% confidence interval on the difference in the two fractions
defective.

11

You might also like