Two Sample Updated Test
Two Sample Updated Test
Gauranga C. Samanta
Assistant Professor
Department of Mathematics
BITS Pilani K K Birla Goa Campus, Goa
Null hypothesis: H0 : p = p0
Test statistic value: z = p p̂ p0
p0 (1 p0 )/n
Alternative Hypothesis: Rejection Region
H1 : p > p 0 z z↵ (upper-tailed)
H1 : p < p 0 z z↵ (lower-tailed)
H1 : p 6= p0 either z z↵/2 or z z↵/2 (two-tailed)
These test procedures are valid provided that np0 10 and
n(1 p0 ) 10 .
Example
Natural cork in wine bottles is subject to deterioration, and as a
result wine in such bottles may experience contamination. The
article “E↵ects of Bottle Closure Type on Consumer Perceptions of
Wine Quality” (Amer. J. of Enology and Viticulture, 2007:
182-191) reported that, in a tasting of commercial chardonnays, 16
of 91 bottles were considered spoiled to some extent by
cork-associated characteristics. Does this data provide strong
evidence for concluding that more than 15% of all such bottles are
contaminated in this way? Let’s carry out a test of hypotheses
using a significance level of 0.10.
Example
A builder claims that heat pumps are installed in 70% of all homes
being constructed today in the city of Goa. Would you agree with
this claim if a random survey of new homes in this city showed
that 8 out of 15 had heat pumps installed? Use a 0.10 level of
significance.
H0 : p = 0.7
H1 : p 6= 0.7
↵ = 0.10
Test statistic binomial variable X with p = 0.7 and n = 15.
Computations: x = 8 and np0 = 15(0.7) = 10.5. Therefore
the computed P- value is P
P = 2P(X 8 when p = 0.7) = 2 8x=0 b(x; 15, 0.7) =
0.2622 > 0.10
Decision: Do not reject H0 . Conclude that there is
insufficient reason to doubt the builder’s claim.
Example
A commonly prescribed drug for relieving nervous tension is
believed to be only 60% e↵ective. Experimental results with a new
drug administered to a random sample of 100 adults who were
su↵ering from nervous tension show that 70 received relief. Is this
sufficient evidence to conclude that the new drug is superior to the
one commonly prescribed? Use a 0.05 level of significance.
H0 : p = 0.6
H1 : p > 0.6
↵ = 0.05
Critical region: z > 1.645
Computiosns: x = 70, n = 100, p̂ = 70/100 and
z = p 0.7 0.6 = 2.04,
(0.6)(0.4)/100
P(Z > 2.04) < 0.0207
Decision: Reject H0 and conclude that the new drug is
superior.
Basic Assumptions:
X1 , X2 , · · · Xm is a random sample from a distribution with
mean µ1 and variance 12 .
Y1 , Y2 , · · · , Yn is a random sample from a distribution with
mean µ2 and variance 22 .
The X and Y samples are independent of one another.
Theorem
The expected value of X̄ Ȳ is µ1 µ2 , so X̄ Ȳ q
is an unbiased
2 2
estimator of µ1 µ2 . The sd of X̄ Ȳ is X̄ Ȳ = m
1
+ 2
n
Basic Assumptions:
X1 , X2 , · · · Xm is a random sample from a distribution with
mean µ1 and variance 12 .
Y1 , Y2 , · · · , Yn is a random sample from a distribution with
mean µ2 and variance 22 .
The X and Y samples are independent of one another.
Theorem
The expected value of X̄ Ȳ is µ1 µ2 , so X̄ Ȳ q
is an unbiased
2 2
estimator of µ1 µ2 . The sd of X̄ Ȳ is X̄ Ȳ = m
1
+ 2
n
Null Hypothesis: H0 : µ1 µ2 = d0
x̄ ȳ d0
Test statistic value: z = r
2 2
1+ 2
m n
Example
Analysis of a random sample consisting of m = 20 specimens of
cold-rolled steel to determine yield strengths resulted in a sample
average strength of x̄ = 29.8 ksi . A second random sample of
n = 25 two-sided galvanized steel specimens gave a sample average
strength of ȳ = 34.7ksi. Assuming that the two yield-strength
distributions are normal with 1 = 4.0 and 2 = 5.0 , does the
data indicate that the corresponding true average yield strengths
µ1 and µ2 are di↵erent? Let’s carry out a test at significance level
↵ = 0.01 .
x̄ ȳ d0
Use of the test statistic value z = r
s2 s2
1+ 2
m n
Example
What impact does fast-food consumption have on various dietary
and health characteristics? The article “E↵ects of Fast-Food
Consumption on Energy Intake and Diet Quality Among Children
in a National Household Study” (Pediatrics, 2004: 112-118)
reported the accompanying summary data on daily calorie intake
both for a sample of teens who said they did not typically eat fast
food and another sample of teens who said they did usually eat
fast food. With
m = 663, n = 413, µ1 = 2258, µ2 = 2637, s1 = 1519, s2 = 1138
Does this data provide strong evidence for concluding that true
average calorie intake for teens who typically eat fast food exceeds
by more than 200 calories per day the true average intake for those
who don’t typically eat fast food? Let’s investigate by carrying out
a test of hypotheses at a significance level of approximately 0.05.
Example
The pressure (Y ) is determined by measurements as a function of
(x). A sample
P of size 25Pwas taken and Pthe resulting data gave the
following
P 2 xi =P1225, yi = 3673, xi yi = 242393,
xi = 80825, yi2 = 726939. Test the hypothesis that slop is 3
at 5% level of significance. Also find the estimated regression line
(round up to four decimal places).
Example
Find the maximum likelihood estimators for ✓ in the probability
density (
function
3✓3 x 4 , if x ✓ > 0
f (x) =
0, otherwise.
and check weather the estimator is biased or unbiased.
Example
The data regarding the production of wheat in tons(X ) and the
price of the flour in kilos (Y ) in the decade of the 80‘ s in Spain
were
X 30 28 32 25 25 25 22 24 35 40
Y 25 30 27 40 42 40 50 45 30 25
Find the regression line using the method of least squares and
compute a 95% confidence interval for the slope of the regression
line.
Example
The temperature Y is determined by measurements as a function
of depth
P20 x. A ample of Psize 20 was takenPand the resulting data
20 20
gave:
P20 2 i=1 x i = 1050,
P20 2 i=1 y i = 5184, i=1 xi yi = 355755,
i=1 xi = 71750, i=1 yi = 1764182.
(a) Find the estimated regression line.
(b) Test the hypothesis that slop is 5 at 5% level of significance.
Example
To estimate the average time required for certain repairs, an
automobile manufacturer engaged 40 mechanics, a random sample,
and measured the time taken by each of them in the performance
of this task. If it took them on an average 24.05 minutes with a
standard deviation of 2.68 minutes, what can the manufacturer
assert with 95% confidence about the maximum error?
Example
An account on server A is more expensive than an account on
server B. However, server A is faster. To see if whether it’s
optimal to go with the faster but more expensive server, a manager
needs to know how much faster it is. A certain computer algorithm
is executed 20 times on server A and 30 times on server B with the
following results
Server- A Server-B
Sample mean 6.7 7.5
Sample sd 0.6 1.2
Example
Internet connections are often slowed by delays at nodes. Let us
determine if the delay time increases during heavyvolume times.
Five hundred packets are sent through the same network between
5pm and 6pm (sample X , and three hundred packets are sent
between 10pm and 11pm (sample Y ). The early sample has a
mean delay time of 0.8 sec with a standard deviation of 0.1 sec
whereas the second sample has a mean delay time of 0.5 sec with a
standard deviation of 0.08 sec. Construct a 99.5% confidence
interval for the di↵erence between the mean delay times.
Example
A manager evaluates e↵ectiveness of a major hardware upgrade by
running a certain process 50 times before the upgrade and 50 times
after it. Based on these data, the average running time is 8.5
minutes before the upgrade, 6.2 minutes after it. Historically, the
standard deviation has been 1.8 minutes, and presumably it has not
changed. Construct a 90% confidence interval showing how much
the mean running time reduced due to the hardware upgrade.
Example
The number of concurrent users for some internet service provider
has always averaged 5000 with a standard deviation of 800. After
an equipment upgrade, the average number of users at 100
randomly selected moments of time is 5200. Does it indicate, at a
5% level of significance, that the mean number of concurrent users
has increased? Assume that the standard deviation of the number
of concurrent users has not changed.
Example
A quality inspector finds 10 defective parts in a sample of 500
parts received from manufacturer A. Out of 400 parts from
manufacturer B, she finds 12 defective ones. A computer-making
company uses these parts in their computers and claims that the
quality of parts produced by A and B is the same. At the 5% level
of significance, do we have enough evidence to disprove this claim?