0% found this document useful (0 votes)
14 views

Statistical Inference - Part1.4

Uploaded by

martins
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Statistical Inference - Part1.4

Uploaded by

martins
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

STATISTICAL INFERENCE

Statistics is the practice or science of collecting and analyzing numerical data in


large quantities, especially for inferring proportions in a whole from those in a
representative sample. Statistics is divided into two, descriptive statistics and
inferential statistics.
Descriptive statistics
Many of the statistical averages and numbers we quote are in effect descriptive
averages. Thus descriptive statistics tries to capture a large set
of observations and gives us some idea about the data set. The measures
of central tendency like mean, median, and mode come under this category, as do
data distributions like normal distribution and corresponding standard deviations.
Inferential Statistics
Inferential statistics, as the name suggests, involves drawing the right conclusions
from the statistical analysis that has been performed using descriptive statistics.
In the end, the inferences make studies important, and this aspect is dealt with in
inferential statistics.
Most predictions of the future and generalizations about a population by studying
a smaller sample come under the purview of inferential statistics. The two main
areas of inferential statistics are estimation and hypothesis testing.

Estimation refers to numerous procedures used to calculate the value of some


property of a population from observations of a sample drawn from the
population.
1. Point Estimation
When a single value is used as an estimate, the estimate is called a point estimate
of the population parameter. In other words, an estimate of a population
parameter given by a single number is called point estimation.

For example,
(i) If 55 is the mean mark obtained by a sample of 5 students randomly drawn
from a class of 100 students is considered to be the mean mark of the entire class.
This single value of 55 is a point estimate.
(ii) If 50 kg is the average weight of a sample of 10 students randomly drawn
from a class of 100 students is considered to be the average weight of the entire
class. This single value of 50 is a point estimate.

Note
The sample mean ( x ) is the sample statistic used as an estimate of population
parameter, mean (μ)
Instead of considering, the estimated value of the population parameter to be a
single value, we might consider an interval for estimating the value of the

1
population parameter. This concept is known as interval estimation and is
explained below.

2. Interval Estimation
Generally, there are situations where point estimation is not desirable and we are
interested in finding limits within which the parameter would be expected to lie
is called an interval estimation.
For example,
If T is a good estimator of θ with standard error s then, making use of general
property of the standard deviations, the uncertainty in T, as an estimator of q, can
be expressed by statements like “We are about 95% certain that the unknown q,
will lie somewhere between T-2s and T+2s”, “we are almost sure that q will lie
in the interval (T-3s and T+3s)” such intervals are called confidence intervals and
is explained below.

Confidence interval
After obtaining the value of the statistic ‘t’ (sample) from a given sample, can we
make some reasonable probability statements about the unknown population
parameter ‘θ’? This question is very well answered by the technique of
Confidence Interval. Let us choose a small value of α which is known as the level
of significance (1% or 5%) and determine two constants say, c1 and c2 such
that P (c1 < θ < c 2 |t) = 1 − α.
The quantities c1 and c2, so determined are known as the Confidence Limits and
the interval [c1, c2] within which the unknown value of the population parameter
is expected to lie is known as Confidence Interval. (1− α) is called as confidence
coefficient.
Confidence Interval for the population mean for Large Samples (when  is
known)
If we take repeated independent random samples of size n from a population with
an unknown mean but known standard deviation, then the probability that the true
population mean μ will fall in the following interval is (1− α) i.e

So, the confidence interval for population mean (μ), when standard deviation (σ)
is known and is given by

For the computation of confidence intervals and for testing of significance, the
critical values Za at the different level of significance is given in the following
table:

2
Normal Probability Table

The calculation of confidence interval is illustrated below.

Example 1
A machine produces a component of a product with a standard deviation of 1.6
cm in length. A random sample of 64 components was selected from the output
and this sample has a mean length of 90 cm. The customer will reject the part if
it is either less than 88 cm or more than 92 cm. Does the 95% confidence interval
for the true mean length of all the components produced ensure acceptance by the
customer?
Solution:
Here μ is the mean length of the components in the population.
The formula for the confidence interval is

Therefore, 90 − (1.96 × 0.2) ≤ μ ≤ 90 + (1.96 × 0.2)


(89.61 ≤ μ ≤ 90.39)
This implies that the probability that the true value of the population mean length
of the components will fall in this interval (89.61,90.39) at 95%. Hence, we
concluded that 95% confidence interval ensures acceptance of the component by
the consumer.

Example 2
A sample of 100 measurements at the breaking strength of cotton thread gave a
mean of 7.4 and a standard deviation of 1.2 gms. Find 95% confidence limits for
the mean breaking strength of the cotton thread.
Solution:

3
This implies that the probability that the true value of the population mean
breaking strength of the cotton threads will fall in this interval (7.165,7.635) at
95%.

Example 3
The mean life time of a sample of 169 light bulbs manufactured by a company is
found to be 1350 hours with a standard deviation of 100 hours. Establish 90%
confidence limits within which the mean life time of light bulbs is expected to lie.
Solution:
Given: n = 169, = 1350 hours, s = 100 hours, since the level of significance is
(100-90)% =10% thus a is 0.1, hence the significant value at 10% is Z a/2 = 1.645

Hence 90% confidence limits for the population mean are

Hence the mean life time of light bulbs is expected to lie between the interval
(1337.35, 1362.65)

Hypothesis Testing
One of the important areas of statistical analysis is the testing of a hypothesis.
Often, in real-life situations, we are required to take decisions about the
population based on sample information. Hypothesis testing is also referred to as

4
“Statistical Decision Making”. It employs statistical techniques to arrive at
decisions in certain situations where there is an element of uncertainty based on
the sample, whose size is fixed in advance. So, statistics helps us in arriving at
the criterion for such decision is known as Testing of hypothesis which was
initiated by J. Neyman and E.S. Pearson.
For Example: We may like to decide based on sample data whether a new vaccine
is effective in curing colds, whether a new training methodology is better than the
existing one, whether the new fertilizer is more productive than the earlier one,
and so on.
Statistical Hypothesis
Statistical hypothesis is some assumption or statement, which may or may not be
true, about a population.
There are two types of statistical hypothesis
(i) Null hypothesis (ii) Alternative hypothesis

Null Hypothesis
According to Prof. R.A. Fisher, “Null hypothesis is the hypothesis which is tested
for possible rejection under the assumption that it is true”, and it is denoted by H0 .
For example: If we want to find the population mean has a specified value μ0 ,
then the null hypothesis H0 is set as follows H0 : μ = μ0.
The Null hypothesis can equally take the form of inequalities like less than or
equal to (≤) or greater than or equal to (≥). For example
Alternative Hypothesis
Any hypothesis which is complementary to the null hypothesis is called as the
alternative hypothesis and is usually denoted by H1.
For example: If we want to test the null hypothesis that the population has
specified mean μ i.e., H0: μ = μ 0 then the alternative hypothesis could be any one
among the following:
i. H1: μ ≠ μ 0 (μ > or μ < μ 0)
ii. H1: μ > μ 0
applicable if the verbs increase, greater than, improve, higher, exceed, surpass,
outperform etc are part of the claim
iii. H1: μ < μ 0
applicable if the verbs: decrease, less than, decline, reduce, fall, deteriorate, etc
For example:
If we want to test the null hypothesis that the population has a mean of at least
μ 0, H0: μ ≥ μ0 then the alternative hypothesis will be H1: μ < μ0

If we want to test the null hypothesis that the population has a mean of at most
μ 0, H0: μ ≤ μ0 then the alternative hypothesis will be H1: μ > μ0

The alternative hypothesis in H1: μ ≠ μ0 is known as two-tailed alternative tests.


Two tailed test is one where the hypothesis about the population parameter is
5
rejected for the value of sample statistic falling into either tail of the sampling
distribution. When the hypothesis about the population parameter is rejected only
for the value of sample statistic falling into one of the tails of the sampling
distribution, then it is known as one-tailed test. Here H1: μ > μ 0 and H1: μ < μ0 are
known as one tailed alternative.

Right tailed test: H1: μ > μ 0 is said to be right tailed test where the rejection region
or critical region lies entirely on the right tail of the normal curve.
Left tailed test: H1: μ < μ 0 is said to be left tailed test where the critical region lies
entirely on the left tail of the normal curve. (diagram)

Types of Errors in Hypothesis testing


There is every chance that a decision regarding a null hypothesis may be correct
or may not be correct. There are two types of errors. They are
Type I error: The error of rejecting H0 when it is true.
Type II error: The error of accepting when H0 it is false.

Critical region or Rejection region

6
A region corresponding to a test statistic in the sample space which tends to
rejection of H0 is called critical region or region of rejection.

Level of significance
The probability of type I error is known as level of significance and it is denoted
by  . The level of significance is usually employed in testing of hypothesis are
5% and 1%. The level of significance is always fixed in advance before collecting
the sample information.

Critical values or significant values


The value of test statistic which separates the critical (or rejection) region and the
acceptance region is called the critical value or significant value. It depends upon
(i) The level of significance
(ii) The alternative hypothesis whether it is two-tailed or single tailed.

Test of significance for single mean


Let xi, (i = 1,2, 3,...,n) is a random sample of size from a normal population with
mean μ and variance σ2 then the sample mean is distributed normally with mean

and variance σ2/n, .


Thus for large samples, the standard normal variate corresponding to x is:

Under the null hypothesis that the sample has been drawn from a population with
mean and variance σ2, i.e., there is no significant difference between the sample
mean ( x ) and the population mean (  ), the test statistic (for large samples) is:

Remark:
If the population standard deviation σ is unknown then we use its estimate
provided by the sample variance given by σ2 = s2, which implies σ = s

Example 4
An auto company decided to introduce a new six-cylinder car whose mean petrol
consumption is claimed to be lower than that of the existing auto engine. It was
found that the mean petrol consumption for the 50 cars was 10 km per litre with
a standard deviation of 3.5 km per litre. Test at 5% level of significance, whether
the claim of the new car petrol consumption is 9.5 km per litre on the average is
acceptable.

7
Solution:
Sample size n =50 Sample mean x = 10 km Sample standard deviation s = 3.5
km
Population mean μ = 9.5 km
Since population SD is unknown, we consider σ = s
The sample is large so we apply the Z-test

Null Hypothesis: There is no significant difference between the sample average


and the company’s claim, i.e., H0: μ = 9.5
Alternative Hypothesis: There is a significant difference between the sample
average and the company’s claim, i.e., H1: μ ≠ 9.5 (two-tailed tests)
The level of significance α = 5% = 0.05
Applying the test statistic

Thus, the calculated value 1.01, and the significant value or table value Zα/2 =
1.96
Comparing the calculated and table value, here Z < Zα/2 i.e., 1.01<1.96.
Inference: Since the calculated value is less than table value i.e., Z < Z α/2 at 5%
level of significance, the null hypothesis H0 is accepted. Hence, we conclude that
the company’s claim that the new car's petrol consumption is 9.5 km per liter is
acceptable.

Example 5
A manufacturer of ball pens claims that a certain pen he manufactures has a mean
writing life of 400 pages with a standard deviation of 20 pages. A purchasing
agent selects a sample of 100 pens and puts them to test. The mean writing life
for the sample was 390 pages. Should the purchasing agent reject the
manufacturer's claim at a 1% level?
Solution:
Sample size n =100, Sample mean x = 390 pages, Population mean μ = 400
pages
Population SD σ = 20 pages
The sample is large so we apply Z -test.
Null Hypothesis: There is no significant difference between the sample mean
and the population mean of writing life of the pen he manufactures, i.e., H0: μ =
400
Alternative Hypothesis: There is a significant difference between the sample
mean and the population mean of writing life of the pen he manufactures, i.e., H1:
μ ≠ 400 (two-tailed test)
The level of significance a = 1% = 0.01
Applying the test statistic

8
Thus the calculated value |Z| = 5 and the significant value or table value Z α/2 =
2.58
Comparing the calculated and table values, we found Z > Zα/2 i.e., 5 > 2.58
Inference: Since the calculated value is greater than the table value i.e., Z > Zα/2 at
1% level of significance, the null hypothesis is rejected and therefore, we
concluded that μ ≠ 400 and the manufacturer’s claim is rejected at a 1% level of
significance.

Example 6
The mean weekly sales of soap bars in departmental stores were 146.3 bars per
store. After an advertising campaign, the mean weekly sales in 400 stores for a
typical week increased to 153.7 and showed a standard deviation of 17.2. Was
the advertising campaign successful?
Solution:
Sample size n = 400 stores
Sample mean = 153.7 bars
Sample SD s = 17.2 bars
Population mean μ = 146.3 bars
Since population SD is unknown, we can consider the sample SD s = σ
Null Hypothesis. The advertising campaign is not successful i.e, H 0: μ = 146.3
(There is no significant difference between the mean weekly sales of soap bars in
department stores before and after the advertising campaign)
Alternative Hypothesis H1: μ > 146.3 (Right tail test). The advertising campaign
was successful
Level of significance a = 0.05
Test statistic

∴ Z = 8.605
Comparing the calculated value Z = 8.605 and the significant value or table value
Zα = 1.645. we get 8.605 > 1.645
9
Inference: Since, the calculated value is much greater than table value i.e., Z >
Zα, it is highly significant at 5% level of significance. Hence, we reject the null
hypothesis H0 and conclude that the advertising campaign was definitely
successful in promoting sales.

Test of Hypotheses for Equality of Means of Two Populations

Example 7
The performance of students of X Standard in a national-level talent search
examination was studied. The scores secured by randomly selected students from
two districts, viz., D1 and D2 of a State were analyzed. The number of students
randomly selected from D1 and D2 are respectively 500 and 800. The average
scores secured by the students selected from D1 and D2 are respectively 58 and
57. Can the samples be regarded as drawn from identical populations having a
common standard deviation 2? Test at a 5% level of significance.
Solution:
Step 1: Let μX and μY be respectively the mean scores secured in the national-
level talent search examination by all the students from the
districts D1 and D2 considered for the study. It is given that the populations of the
scores of the students of these districts have the common standard deviation σ =
2. The null and alternative hypotheses are
Null hypothesis: H0: µX = µY
i.e., average scores secured by the students from the study districts are not
significantly different.
Alternative hypothesis: H1: µX ≠ µY
i.e., average scores secured by the students from the study districts are
significantly different. It is a two-sided alternative.
Step 2: Data
The given sample information are:
Size of the Sample-1 (m) = 500
Size of the Sample-2 (n) = 800. Hence, both the samples are large.
Mean of Sample-1 ( x ) = 58
Mean of Sample-2 ( y ) = 57
Step 3: Level of significance
α= 5%
Step 4: Test statistic
The test statistic under the null hypothesis H0 is

10
Since both m and n are large, the sampling distribution of Z under H0 is the N(0,
1) distribution.
Step 5: Calculation of Test Statistic
The value of Z is calculated for the given sample information from

Step-6: Critical value


Since H1 is a two-sided alternative hypothesis, the critical value at α = 0.05 is ze =
z0.025 = 1.96.
Step-7: Decision
Since H1 is a two-sided alternative, elements of the critical region are defined by
the rejection rule |z0 | ≥ z e = z0.025. For the given sample information, |z0| = 8.77
> ze = 1.96. It indicates that the given sample contains sufficient evidence to
reject H0. Thus, it may be decided that H0 is rejected. Therefore, the average
performance of the students in the districts D1 and D2 in the national level talent
search examination are significantly different. Thus, the given samples are not
drawn from identical populations.
Example 8
A Model Examination was conducted to XII Standard students in the subject of
Statistics. A District Educational Officer wanted to analyze the Gender-wise
performance of the students using the marks secured by randomly selected boys
and girls. Sample measures were calculated and the details are presented below:

Test, at 5% level of significance, whether performance of the students differ


significantly with respect to their gender.
Solution:
Step 1: Let μX and μY denote respectively the average marks secured by boys and
girls in the Model Examination conducted to the XII Standard students in the
subject of Statistics. Then, the null and the alternative hypotheses are
Null hypothesis: H0: µ X = µY

11
i.e., there is no significant difference in the performance of the students with
respect to their gender.
Alternative hypothesis: H1: µ X ≠ µY
i.e., performance of the students differs significantly with the respect to the
gender. It is a two-sided alternative hypothesis.
Step 2: Data
The given sample information are:

Since m ≥ 30 and n ≥ 30, both the samples are large.


Step 3: Level of significance
α= 5%
Step 4: Test statistic
The test statistic under H0 is

The sampling distribution of Z under H0 is the N(0,1) distribution.


Step 5: Calculation of the Test Statistic
The value of Z is calculated for the given sample informations from

Step 6: Critical value


Since H1 is a two-sided alternative, the critical value at 5% level of significance
is ze = z0.025 = 1.96.
Step 7: Decision
Since H1 is a two-sided alternative, elements of the critical region are determined
by the rejection rule |z0| ≥ z0 . Thus, it is a two-tailed test. But, |z0|= 1.75 is less
than the critical value ze = 1.96. Hence, it may infer as the given sample
12
information does not provide sufficient evidence to reject H0. Therefore, it may
be decided that there is no sufficient evidence in the given sample to conclude
that performance of boys and girls in the Model Examination conducted in the
subject of Statistics differ significantly.

Student’s t Distribution and its Application


1. Properties of the Student’s t-distribution
1. t–distribution is symmetrical distribution with mean zero.
2. The graph of t-distribution is similar to normal distribution except for the
following two reasons:
(i) The normal distribution curve is higher in the middle than t-distribution curve.
(ii) t–distribution has a greater spread sideways than the normal distribution
curve. It means that there is more area in the tails of t-distribution.

3. The t-distribution curve is asymptotic to X-axis, that is, it extends to infinity on


either side.
4. The shape of t-distribution curve varies with the degrees of freedom. The larger
is the number of degrees of freedom, closeness of its shape to standard normal
distribution (fig. 2.1).
5. Sampling distribution of t does not depend on population parameter. It depends
on degrees of freedom (n–1).

13
4. Test of Hypotheses for Normal Population Mean (Population Variance is
Unknown)
Procedure:
Step 1: Let µ and σ2 be respectively the mean and variance of the population
under study, where σ2 is unknown. If µ0 is an admissible value of µ, then frame
the null hypothesis as
H0: µ = µ0 and choose the suitable alternative hypothesis from
(i) H1: µ ≠ µ0 (ii) H1: µ > µ0 (iii) H1: µ < µ0
Step 2: Describe the sample/data and its descriptive measures. Let (X1, X2, …, Xn)
be a random sample of n observations drawn from the population, where n is
small (n < 30).
Step 3: Specify the level of significance, α.

Step 4: Consider the test statistic , under H0, where X and S are the
sample mean and sample standard deviation respectively. The approximate
sampling distribution of the test statistic under H0 is the t-distribution with (n–1)
degrees of freedom.
Step 5: Calculate the value of t for the given sample ( x1 , x2 ,... xn )

.
Step 6: Choose the critical value, te, corresponding to α and H1 from the
following table

Step 7: Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.

Example 1
The average monthly sales, based on past experience of a particular brand of tooth
paste in departmental stores is ₹ 200. An advertisement campaign was made by
the company and then a sample of 26 departmental stores was taken at random
and found that the average sales of the particular brand of tooth paste is ₹ 216
with a standard deviation of ₹8. Does the campaign have helped in promoting the
sales of a particular brand of tooth paste?

14
Solution:
Step 1: Hypotheses
Null Hypothesis H0: µ = 200
i.e., the average monthly sales of a particular brand of tooth paste is not
significantly different from ₹ 200.
Alternative Hypothesis H1: µ > 200
i.e., the average monthly sales of a particular brand of tooth paste are
significantly different from ₹ 200. It is one-sided (right) alternative hypothesis.
Step 2: Data
The given sample information are:
Size of the sample (n) = 26. Hence, it is a small sample.
Sample mean ( x ) = 216, Standard deviation of the sample = 8.
Step 3: Level of significance
α = 5%
Step 4: Test statistic

The test statistic under H0 is T =


Since n is small, the sampling distribution of T is the t-distribution with (n–1)
degrees of freedom.
Step 5: Calculation of test statistic
The value of T for the given sample information is calculated from

Step 6: Critical value


Since H1 is one-sided (right) alternative hypothesis, the critical value at α =0.05
is
te = tn-1, α =t25,0.05 = 1.708
Step 7: Decision
Since it is right-tailed test, elements of critical region are defined by the rejection
rule t0 >te = tn-1, α = t25,0.05 = 1.708. For the given sample information t0 = 10.20 >
te =1.708. It indicates that given sample contains sufficient evidence to reject H0.
Hence, the campaign has helped in promoting the increase in sales of a particular
brand of tooth paste.

Example 2

15
A sample of 10 students from a school was selected. Their scores in a particular
subject are 72, 82, 96, 85, 84, 75, 76, 93, 94 and 93. Can we support the claim
that the class average scores is 90?
Solution:
Step 1: Hypotheses
Null Hypothesis H0: µ = 90
i.e., the class average score is not significantly different from 90.
Alternative Hypothesis H1 : µ ≠ 90
i.e., the class mean score is significantly different from 90.
It is a two-sided alternative hypothesis.
Step 2: Data
The given sample information are:
Size of the sample (n) = 10. Hence, it is a small sample.
Step 3: Level of significance
α= 5%
Step 4: Test statistic

The test statistic under H0 is T =


Since n is small, the sampling distribution of T is the t - t-distribution with (n–1)
degrees of freedom.
Step 5: Calculation of test statistic

The value of T for the given sample information is calculated from t0 = as


under:

16
Sample mean

Step 6: Critical value


Since H1 is two-sided alternative hypothesis, the critical value at α = 0.05 is te =
tn-1, α/2 = t9,0.025 = 2.262
Step 7: Decision
Since it is a two-tailed test, elements of critical region are defined by the rejection
rule |t0| > te = t n-1, α/2 = t =2.262. For the given sample information |t0| = 1.806 <
te = 2.262.
It indicates that the given sample does not provide sufficient evidence to reject
H0. Hence, we conclude that the class average score is 90.

5. Test of Hypotheses for Equality of Means of Two Normal Populations


(Independent Random Samples)
Procedure:

17
Step 1: Let μX and μY be respectively the means of population-1 and population-
2 under study. The variances of the population-1 and population-2 are assumed
to be equal and unknown given by σ2.
Frame the null hypothesis as H0: µX = µY and choose the suitable alternative
hypothesis from (i) H1: μX ≠ μY (ii) H1 : μX > μY (iii) H1: μX < μY
Step 2: Describe the sample/data. Let (X1, X2 , …, Xm) be a random sample of m
observations drawn from Population-1 and (Y1, Y2 , …, Yn) be a random sample
of n observations drawn from Population-2, where m and n are small (i.e., m < 30
and n < 30). Here, these two samples are assumed to be independent.
Step 3: Set up level of significance (α)
Step 4: Consider the test statistic

where Sp is the “pooled” standard deviation (combined standard deviation) given


by

The approximate sampling distribution of the test statistic

is the t-distribution with m+n–2 degrees of freedom i.e., t ~ tm+n–2.


Step 5 : Calculate the value of T for the given sample ( x1 , x2 ,... xm ) and
( y1 , y 2 ,... yn ) as

18
Step 6: Choose the critical value, te, corresponding to α and H1 from the
following table

Step 7: Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.

Example 3
The following table gives the scores (out of 15) of two batches of students in an
examination.

Test at 1% level of significance the average performance of the students in Batch


I and Batch II are equal.
Solution:
Step 1: Hypotheses: Let µX and µY denote respectively the average performance
of students in Batch I and Batch II. Then the null and alternative hypotheses are
:
Null Hypothesis H0: µ X = µY
i.e., the average performance of the students in Batch I and Batch II are equal.
Alternative Hypothesis H1: µ X ≠ µY
i.e., the average performance of the students in Batch I and Batch II are not equal.
Step 2: Data
The given sample information are:
Sample size for Batch I: m =10
Sample size for Batch II: n = 8
19
Step 3: Level of significance
α= 1%
Step 4: Test statistic
The test statistic under H0 is

The sampling distribution of T under H0 is the t-distribution with m+n–2 degrees


of freedom i.e., t ~ tm+n–2
Step 5: Calculation of test statistic
To find sample mean and sample standard deviation:

To find sample means:


Let (x1 , x2 ,..., x10) and (y1, y2 ,..., y8) denote the scores of students in Batch I and
Batch II respectively.

To find combined sample standard deviation:

20
Pooled standard deviation is:

The value of T is calculated for the given information as

Step 6: Critical value


Since H1 is two-sided alternative hypothesis, the critical value at α = 0.01 is te =
tm+n-2, α/2 = t16,0.005 = 2.921
Step 7: Decision
Since it is two-tailed test, elements of critical region are defined by the rejection
rule |t0| < te = tm+n-2, α = t16,0.005 = 2.921. For the given sample information |t0| =
1.3957 < te = 2.921.
It indicates that2 given sample contains insufficient evidence to reject H0. Hence,
the mean performance of the students in these batches are equal.

Example 4
Two types of batteries are tested for their length of life (in hours). The following
data is the summary descriptive statistics.

Is there any significant difference between the average life of the two batteries at
5% level of significance?
Solution:
Step 1: Hypotheses
Null Hypothesis H0: μX = μY
i.e., there is no significant difference in average life of two types of
batteries A and B.

21
Alternative Hypothesis H0: μX ≠ μY
i.e., there is significant difference in average life of two types of
batteries A and B. It is a two-sided alternative hypothesis
Step 2: Data
The given sample information are:
m = number of batteries under type A = 14
n = number of batteries under type B = 13
= Average life (in hours) of type A battery = 94

= Average life (in hours) of type B battery = 86


sX = standard deviation of type A battery =16
sY = standard deviation of type B battery = 20
Step 3: Level of significance
α= 5%
Step 4: Test statistic
The test statistic under H0 is

The sampling distribution of T under H0 is the t-distribution with m+n–2 degrees


of freedom i.e., t ~ tm+n–2
Step 5: Calculation of test statistic
Under null hypotheses H0:

where s is the pooled standard deviation given by,

The value of T is calculated for the given information as

Step 6: Critical value

22
Since H1 is two-sided alternative hypothesis, the critical value at α = 0.05 is t e =
tm+n-2, α/2 = t25, 0.025 = 2.060.
Step 7: Decision
Since it is a two-tailed test, elements of critical region are defined by the rejection
rule |t0| < te = tm+n-2,α/2 = t25, 0.025 = 2.060. For the given sample information |t0| =
1.15 < te = 2.060. It indicates that2 given sample contains insufficient evidence
to reject H0. Hence, there is no significant difference between the average life of
the two types of batteries.

Test of Hypotheses for Population Proportion


Procedure:
Step 1: Let P denote the proportion of the population possessing the qualitative
characteristic (attribute) under study. If p0 is an admissible value of P, then frame
the null hypothesis as H0:P = p0 and choose the suitable alternative hypothesis
from
(i) H1: P ≠ p0 (ii) H1: P > p0 (iii) H1: P < p0
Step 2: Let p be proportion of the sample observations possessing the attribute,
where n is large, np > 5 and n(1 – p) > 5.
Step 3: Specify the level of significance, α.

Step 4: Consider the test statistic Z under H0. Here, Q = 1 – P.


The approximate sampling distribution of the test statistic under H0 is
the N(0,1) distribution.
Step 5: Calculate the value of Z under H0 for the given data

as
Step 6: Choose the critical value, ze, corresponding to α and H1 from the
following table

Step 7: Make decision on H0 choosing the suitable rejection rule from the
following table corresponding to H1.

23
Example 1
A survey was conducted among the citizens of a city to study their preference
towards consumption of tea and coffee. Among 1000 randomly selected persons,
it is found that 560 are tea-drinkers and the remaining are coffee-drinkers. Can
we conclude at 1% level of significance from this information that both tea and
coffee are equally preferred among the citizens in the city?
Solution:
Step 1: Let P denote the proportion of people in the city who preferred to
consume tea.
Then, the null and the alternative hypotheses are
Null hypothesis: H 0: P = 0.5
i.e., it is significant that both tea and coffee are preferred equally in the city.
Alternative hypothesis: H 1: P ≠ 0.5
i.e., preference of tea and coffee are not significantly equal. It is a two-sided
alternative hypothesis.
Step 2: Data
The given sample information are:
Sample size (n) = 1000. Hence, it is a large sample.
No. of tea-drinkers = 560
Sample proportion (p) = 560/1000 = 0.56
Step 3: Level of significance
α= 1%
Step 4: Test statistic
Since n is large, np = 560 > 5 and n(1 – p) = 440 > 5, the test statistic under the

null hypothesis, is Z = .
Its sampling distribution under H0 is the N(0,1) distribution.
Step 5: Calculation of Test Statistic
The value of Z can be calculated for the sample information from

Thus, z0 = 3.79
Step 6: Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at 1% level of
significance is zα/2 = z0.005 = 2.58.

24
Step 7: Decision
Since H1 is a two-sided alternative, elements of the critical region are determined
by the rejection rule |z0| ≥ z e. Thus, it is a two-tailed test. Since |z0| = 3.79 > ze =
2.58, reject H0 at a 1% level of significance. Therefore, there is significant
evidence to conclude that the preference of tea and coffee are different.

Test of Hypotheses for Equality of Proportions of two Populations


Procedure:
Step 1: Let PX and PY denote respectively the proportions of Population-1 and
Population-2 possessing the qualitative characteristic (attribute) under study.
Frame the null hypothesis as H0: PX=PY and choose the suitable alternative
hypothesis from
(i) H1: PX≠ PY (ii) H1: PX>PY (iii) H1: PX<PY
Step 2: Let p X and pY denote respectively the proportions of the samples of
sizes m and n drawn from Population-1 and Population-2 possessing the
attribute, where m and n are large (i.e., m ≥ 30 and n ≥ 30). Also, mpX >
5, m (1- pX) > 5, npY > 5 and n (1 - pY ) > 5.
Here, these two samples are assumed to be independent.
Step 3: Specify the level of significance, α.

Step 4: Consider the test statistic under H 0.

Here, . The approximate sampling distribution of the


test statistic under H0 is the N(0,1) distribution.

Step 5: Calculate the value of Z for the given data as z0 = .


Step 6: Choose the critical value, ze, corresponding to α and H1 from the
following table

Step 7: Decide on H0 choosing the suitable rejection rule from the following
table corresponding to H1.

25
Example 2
A study was conducted to investigate the interest of people living in cities towards
self-employment. Among randomly selected 500 persons from City-1, 400
persons were found to be self-employed. From City -2, 800 persons were selected
randomly and among them 600 persons are self-employed. Do the data indicate
that the two cities are significantly different concerning the prevalence of self-
employment among the persons? Choose the level of significance as α = 0.05.
Solution:
Step1: Let PX and PY be respectively the proportions of self-employed people in
City-1 and City-2. Then, the null and alternative hypotheses are
Null hypothesis: H0: PX = PY
i.e., there is no significant difference between the proportions of self-
employed people in City-1 and City-2.
Alternative hypothesis: H1: PX ≠ PY
i.e., difference between the proportions of self-employed people in City-1 and
City-2 is significant. It is a two-sided alternative hypothesis.
Step 2: Data
The given sample information are

Here, m ≥ 30, n ≥ 30, mpX = 400 > 5, m(1− pX) = 100 > 5, npY = 600 > 5
and n(1− pY) = 200 > 5.
Step 3: Level of significance
α= 5%
Step 4: Test statistic
The test statistic under the null hypothesis is

The sampling distribution of Z under H0 is the N(0,1) distribution.


Step 5 : Calculation of Test Statistic
The value of Z for given sample information is calculated from

26
z0 = 2.0764
Step 6: Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at a 5% level of
significance is ze = 1.96.
Step 7: Decision
Since H0 is a two-sided alternative, elements of the critical region are determined
by the rejection rule |z0| > ze. Thus, it is a two-tailed test. For the given sample
information, ze = 2.0764 > ze = 1.96. Hence, H0 is rejected. We can conclude that
the difference between the proportions of self-employed people in City-1 and
City-2 is significant.

Practice questions

1. A company claims that the average waiting time for customers in their store
is less than 5 minutes. To test this claim, a random sample of 100 customers
is selected, and their waiting times are recorded. The sample mean waiting
time is found to be 4.6 minutes with a standard deviation of 1.2 minutes.
Conduct a hypothesis test at the 5% significance level to determine if there
is enough evidence to support the company's claim.
2. An educational program claims that it increases students' test scores. To
test this claim, a random sample of 200 students who participated in the
program is selected, and their test scores are recorded. The sample mean
test score is found to be 85 with a standard deviation of 10. Conduct a
hypothesis test at the 1% significance level to determine if there is enough
evidence to support the claim that the program increases test scores.
3. A manufacturer claims that the average weight of their cereal boxes is 500
grams. To test this claim, a random sample of 150 cereal boxes is selected,
and their weights are measured. The sample mean weight is found to be
490 grams with a standard deviation of 20 grams. Conduct a hypothesis
test at the 5% significance level to determine if there is enough evidence to
support the manufacturer's claim.

27
4. A researcher is investigating whether a new teaching method reduces
students' anxiety levels. A random sample of 10 students is selected, and
their anxiety levels are measured before and after implementing the new
method. The researcher wants to determine if there is evidence that the new
method reduces anxiety levels. The sample mean reduction in anxiety
levels is found to be 4 points with a standard deviation of 2 points. Conduct
a hypothesis test at the 5% significance level to determine if there is enough
evidence to support the claim that the new method reduces anxiety levels .

28

You might also like