Statistical Inference - Part1.4
Statistical Inference - Part1.4
For example,
(i) If 55 is the mean mark obtained by a sample of 5 students randomly drawn
from a class of 100 students is considered to be the mean mark of the entire class.
This single value of 55 is a point estimate.
(ii) If 50 kg is the average weight of a sample of 10 students randomly drawn
from a class of 100 students is considered to be the average weight of the entire
class. This single value of 50 is a point estimate.
Note
The sample mean ( x ) is the sample statistic used as an estimate of population
parameter, mean (μ)
Instead of considering, the estimated value of the population parameter to be a
single value, we might consider an interval for estimating the value of the
1
population parameter. This concept is known as interval estimation and is
explained below.
2. Interval Estimation
Generally, there are situations where point estimation is not desirable and we are
interested in finding limits within which the parameter would be expected to lie
is called an interval estimation.
For example,
If T is a good estimator of θ with standard error s then, making use of general
property of the standard deviations, the uncertainty in T, as an estimator of q, can
be expressed by statements like “We are about 95% certain that the unknown q,
will lie somewhere between T-2s and T+2s”, “we are almost sure that q will lie
in the interval (T-3s and T+3s)” such intervals are called confidence intervals and
is explained below.
Confidence interval
After obtaining the value of the statistic ‘t’ (sample) from a given sample, can we
make some reasonable probability statements about the unknown population
parameter ‘θ’? This question is very well answered by the technique of
Confidence Interval. Let us choose a small value of α which is known as the level
of significance (1% or 5%) and determine two constants say, c1 and c2 such
that P (c1 < θ < c 2 |t) = 1 − α.
The quantities c1 and c2, so determined are known as the Confidence Limits and
the interval [c1, c2] within which the unknown value of the population parameter
is expected to lie is known as Confidence Interval. (1− α) is called as confidence
coefficient.
Confidence Interval for the population mean for Large Samples (when is
known)
If we take repeated independent random samples of size n from a population with
an unknown mean but known standard deviation, then the probability that the true
population mean μ will fall in the following interval is (1− α) i.e
So, the confidence interval for population mean (μ), when standard deviation (σ)
is known and is given by
For the computation of confidence intervals and for testing of significance, the
critical values Za at the different level of significance is given in the following
table:
2
Normal Probability Table
Example 1
A machine produces a component of a product with a standard deviation of 1.6
cm in length. A random sample of 64 components was selected from the output
and this sample has a mean length of 90 cm. The customer will reject the part if
it is either less than 88 cm or more than 92 cm. Does the 95% confidence interval
for the true mean length of all the components produced ensure acceptance by the
customer?
Solution:
Here μ is the mean length of the components in the population.
The formula for the confidence interval is
Example 2
A sample of 100 measurements at the breaking strength of cotton thread gave a
mean of 7.4 and a standard deviation of 1.2 gms. Find 95% confidence limits for
the mean breaking strength of the cotton thread.
Solution:
3
This implies that the probability that the true value of the population mean
breaking strength of the cotton threads will fall in this interval (7.165,7.635) at
95%.
Example 3
The mean life time of a sample of 169 light bulbs manufactured by a company is
found to be 1350 hours with a standard deviation of 100 hours. Establish 90%
confidence limits within which the mean life time of light bulbs is expected to lie.
Solution:
Given: n = 169, = 1350 hours, s = 100 hours, since the level of significance is
(100-90)% =10% thus a is 0.1, hence the significant value at 10% is Z a/2 = 1.645
Hence the mean life time of light bulbs is expected to lie between the interval
(1337.35, 1362.65)
Hypothesis Testing
One of the important areas of statistical analysis is the testing of a hypothesis.
Often, in real-life situations, we are required to take decisions about the
population based on sample information. Hypothesis testing is also referred to as
4
“Statistical Decision Making”. It employs statistical techniques to arrive at
decisions in certain situations where there is an element of uncertainty based on
the sample, whose size is fixed in advance. So, statistics helps us in arriving at
the criterion for such decision is known as Testing of hypothesis which was
initiated by J. Neyman and E.S. Pearson.
For Example: We may like to decide based on sample data whether a new vaccine
is effective in curing colds, whether a new training methodology is better than the
existing one, whether the new fertilizer is more productive than the earlier one,
and so on.
Statistical Hypothesis
Statistical hypothesis is some assumption or statement, which may or may not be
true, about a population.
There are two types of statistical hypothesis
(i) Null hypothesis (ii) Alternative hypothesis
Null Hypothesis
According to Prof. R.A. Fisher, “Null hypothesis is the hypothesis which is tested
for possible rejection under the assumption that it is true”, and it is denoted by H0 .
For example: If we want to find the population mean has a specified value μ0 ,
then the null hypothesis H0 is set as follows H0 : μ = μ0.
The Null hypothesis can equally take the form of inequalities like less than or
equal to (≤) or greater than or equal to (≥). For example
Alternative Hypothesis
Any hypothesis which is complementary to the null hypothesis is called as the
alternative hypothesis and is usually denoted by H1.
For example: If we want to test the null hypothesis that the population has
specified mean μ i.e., H0: μ = μ 0 then the alternative hypothesis could be any one
among the following:
i. H1: μ ≠ μ 0 (μ > or μ < μ 0)
ii. H1: μ > μ 0
applicable if the verbs increase, greater than, improve, higher, exceed, surpass,
outperform etc are part of the claim
iii. H1: μ < μ 0
applicable if the verbs: decrease, less than, decline, reduce, fall, deteriorate, etc
For example:
If we want to test the null hypothesis that the population has a mean of at least
μ 0, H0: μ ≥ μ0 then the alternative hypothesis will be H1: μ < μ0
If we want to test the null hypothesis that the population has a mean of at most
μ 0, H0: μ ≤ μ0 then the alternative hypothesis will be H1: μ > μ0
Right tailed test: H1: μ > μ 0 is said to be right tailed test where the rejection region
or critical region lies entirely on the right tail of the normal curve.
Left tailed test: H1: μ < μ 0 is said to be left tailed test where the critical region lies
entirely on the left tail of the normal curve. (diagram)
6
A region corresponding to a test statistic in the sample space which tends to
rejection of H0 is called critical region or region of rejection.
Level of significance
The probability of type I error is known as level of significance and it is denoted
by . The level of significance is usually employed in testing of hypothesis are
5% and 1%. The level of significance is always fixed in advance before collecting
the sample information.
Under the null hypothesis that the sample has been drawn from a population with
mean and variance σ2, i.e., there is no significant difference between the sample
mean ( x ) and the population mean ( ), the test statistic (for large samples) is:
Remark:
If the population standard deviation σ is unknown then we use its estimate
provided by the sample variance given by σ2 = s2, which implies σ = s
Example 4
An auto company decided to introduce a new six-cylinder car whose mean petrol
consumption is claimed to be lower than that of the existing auto engine. It was
found that the mean petrol consumption for the 50 cars was 10 km per litre with
a standard deviation of 3.5 km per litre. Test at 5% level of significance, whether
the claim of the new car petrol consumption is 9.5 km per litre on the average is
acceptable.
7
Solution:
Sample size n =50 Sample mean x = 10 km Sample standard deviation s = 3.5
km
Population mean μ = 9.5 km
Since population SD is unknown, we consider σ = s
The sample is large so we apply the Z-test
Thus, the calculated value 1.01, and the significant value or table value Zα/2 =
1.96
Comparing the calculated and table value, here Z < Zα/2 i.e., 1.01<1.96.
Inference: Since the calculated value is less than table value i.e., Z < Z α/2 at 5%
level of significance, the null hypothesis H0 is accepted. Hence, we conclude that
the company’s claim that the new car's petrol consumption is 9.5 km per liter is
acceptable.
Example 5
A manufacturer of ball pens claims that a certain pen he manufactures has a mean
writing life of 400 pages with a standard deviation of 20 pages. A purchasing
agent selects a sample of 100 pens and puts them to test. The mean writing life
for the sample was 390 pages. Should the purchasing agent reject the
manufacturer's claim at a 1% level?
Solution:
Sample size n =100, Sample mean x = 390 pages, Population mean μ = 400
pages
Population SD σ = 20 pages
The sample is large so we apply Z -test.
Null Hypothesis: There is no significant difference between the sample mean
and the population mean of writing life of the pen he manufactures, i.e., H0: μ =
400
Alternative Hypothesis: There is a significant difference between the sample
mean and the population mean of writing life of the pen he manufactures, i.e., H1:
μ ≠ 400 (two-tailed test)
The level of significance a = 1% = 0.01
Applying the test statistic
8
Thus the calculated value |Z| = 5 and the significant value or table value Z α/2 =
2.58
Comparing the calculated and table values, we found Z > Zα/2 i.e., 5 > 2.58
Inference: Since the calculated value is greater than the table value i.e., Z > Zα/2 at
1% level of significance, the null hypothesis is rejected and therefore, we
concluded that μ ≠ 400 and the manufacturer’s claim is rejected at a 1% level of
significance.
Example 6
The mean weekly sales of soap bars in departmental stores were 146.3 bars per
store. After an advertising campaign, the mean weekly sales in 400 stores for a
typical week increased to 153.7 and showed a standard deviation of 17.2. Was
the advertising campaign successful?
Solution:
Sample size n = 400 stores
Sample mean = 153.7 bars
Sample SD s = 17.2 bars
Population mean μ = 146.3 bars
Since population SD is unknown, we can consider the sample SD s = σ
Null Hypothesis. The advertising campaign is not successful i.e, H 0: μ = 146.3
(There is no significant difference between the mean weekly sales of soap bars in
department stores before and after the advertising campaign)
Alternative Hypothesis H1: μ > 146.3 (Right tail test). The advertising campaign
was successful
Level of significance a = 0.05
Test statistic
∴ Z = 8.605
Comparing the calculated value Z = 8.605 and the significant value or table value
Zα = 1.645. we get 8.605 > 1.645
9
Inference: Since, the calculated value is much greater than table value i.e., Z >
Zα, it is highly significant at 5% level of significance. Hence, we reject the null
hypothesis H0 and conclude that the advertising campaign was definitely
successful in promoting sales.
Example 7
The performance of students of X Standard in a national-level talent search
examination was studied. The scores secured by randomly selected students from
two districts, viz., D1 and D2 of a State were analyzed. The number of students
randomly selected from D1 and D2 are respectively 500 and 800. The average
scores secured by the students selected from D1 and D2 are respectively 58 and
57. Can the samples be regarded as drawn from identical populations having a
common standard deviation 2? Test at a 5% level of significance.
Solution:
Step 1: Let μX and μY be respectively the mean scores secured in the national-
level talent search examination by all the students from the
districts D1 and D2 considered for the study. It is given that the populations of the
scores of the students of these districts have the common standard deviation σ =
2. The null and alternative hypotheses are
Null hypothesis: H0: µX = µY
i.e., average scores secured by the students from the study districts are not
significantly different.
Alternative hypothesis: H1: µX ≠ µY
i.e., average scores secured by the students from the study districts are
significantly different. It is a two-sided alternative.
Step 2: Data
The given sample information are:
Size of the Sample-1 (m) = 500
Size of the Sample-2 (n) = 800. Hence, both the samples are large.
Mean of Sample-1 ( x ) = 58
Mean of Sample-2 ( y ) = 57
Step 3: Level of significance
α= 5%
Step 4: Test statistic
The test statistic under the null hypothesis H0 is
10
Since both m and n are large, the sampling distribution of Z under H0 is the N(0,
1) distribution.
Step 5: Calculation of Test Statistic
The value of Z is calculated for the given sample information from
11
i.e., there is no significant difference in the performance of the students with
respect to their gender.
Alternative hypothesis: H1: µ X ≠ µY
i.e., performance of the students differs significantly with the respect to the
gender. It is a two-sided alternative hypothesis.
Step 2: Data
The given sample information are:
13
4. Test of Hypotheses for Normal Population Mean (Population Variance is
Unknown)
Procedure:
Step 1: Let µ and σ2 be respectively the mean and variance of the population
under study, where σ2 is unknown. If µ0 is an admissible value of µ, then frame
the null hypothesis as
H0: µ = µ0 and choose the suitable alternative hypothesis from
(i) H1: µ ≠ µ0 (ii) H1: µ > µ0 (iii) H1: µ < µ0
Step 2: Describe the sample/data and its descriptive measures. Let (X1, X2, …, Xn)
be a random sample of n observations drawn from the population, where n is
small (n < 30).
Step 3: Specify the level of significance, α.
Step 4: Consider the test statistic , under H0, where X and S are the
sample mean and sample standard deviation respectively. The approximate
sampling distribution of the test statistic under H0 is the t-distribution with (n–1)
degrees of freedom.
Step 5: Calculate the value of t for the given sample ( x1 , x2 ,... xn )
.
Step 6: Choose the critical value, te, corresponding to α and H1 from the
following table
Step 7: Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Example 1
The average monthly sales, based on past experience of a particular brand of tooth
paste in departmental stores is ₹ 200. An advertisement campaign was made by
the company and then a sample of 26 departmental stores was taken at random
and found that the average sales of the particular brand of tooth paste is ₹ 216
with a standard deviation of ₹8. Does the campaign have helped in promoting the
sales of a particular brand of tooth paste?
14
Solution:
Step 1: Hypotheses
Null Hypothesis H0: µ = 200
i.e., the average monthly sales of a particular brand of tooth paste is not
significantly different from ₹ 200.
Alternative Hypothesis H1: µ > 200
i.e., the average monthly sales of a particular brand of tooth paste are
significantly different from ₹ 200. It is one-sided (right) alternative hypothesis.
Step 2: Data
The given sample information are:
Size of the sample (n) = 26. Hence, it is a small sample.
Sample mean ( x ) = 216, Standard deviation of the sample = 8.
Step 3: Level of significance
α = 5%
Step 4: Test statistic
Example 2
15
A sample of 10 students from a school was selected. Their scores in a particular
subject are 72, 82, 96, 85, 84, 75, 76, 93, 94 and 93. Can we support the claim
that the class average scores is 90?
Solution:
Step 1: Hypotheses
Null Hypothesis H0: µ = 90
i.e., the class average score is not significantly different from 90.
Alternative Hypothesis H1 : µ ≠ 90
i.e., the class mean score is significantly different from 90.
It is a two-sided alternative hypothesis.
Step 2: Data
The given sample information are:
Size of the sample (n) = 10. Hence, it is a small sample.
Step 3: Level of significance
α= 5%
Step 4: Test statistic
16
Sample mean
17
Step 1: Let μX and μY be respectively the means of population-1 and population-
2 under study. The variances of the population-1 and population-2 are assumed
to be equal and unknown given by σ2.
Frame the null hypothesis as H0: µX = µY and choose the suitable alternative
hypothesis from (i) H1: μX ≠ μY (ii) H1 : μX > μY (iii) H1: μX < μY
Step 2: Describe the sample/data. Let (X1, X2 , …, Xm) be a random sample of m
observations drawn from Population-1 and (Y1, Y2 , …, Yn) be a random sample
of n observations drawn from Population-2, where m and n are small (i.e., m < 30
and n < 30). Here, these two samples are assumed to be independent.
Step 3: Set up level of significance (α)
Step 4: Consider the test statistic
18
Step 6: Choose the critical value, te, corresponding to α and H1 from the
following table
Step 7: Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Example 3
The following table gives the scores (out of 15) of two batches of students in an
examination.
20
Pooled standard deviation is:
Example 4
Two types of batteries are tested for their length of life (in hours). The following
data is the summary descriptive statistics.
Is there any significant difference between the average life of the two batteries at
5% level of significance?
Solution:
Step 1: Hypotheses
Null Hypothesis H0: μX = μY
i.e., there is no significant difference in average life of two types of
batteries A and B.
21
Alternative Hypothesis H0: μX ≠ μY
i.e., there is significant difference in average life of two types of
batteries A and B. It is a two-sided alternative hypothesis
Step 2: Data
The given sample information are:
m = number of batteries under type A = 14
n = number of batteries under type B = 13
= Average life (in hours) of type A battery = 94
22
Since H1 is two-sided alternative hypothesis, the critical value at α = 0.05 is t e =
tm+n-2, α/2 = t25, 0.025 = 2.060.
Step 7: Decision
Since it is a two-tailed test, elements of critical region are defined by the rejection
rule |t0| < te = tm+n-2,α/2 = t25, 0.025 = 2.060. For the given sample information |t0| =
1.15 < te = 2.060. It indicates that2 given sample contains insufficient evidence
to reject H0. Hence, there is no significant difference between the average life of
the two types of batteries.
as
Step 6: Choose the critical value, ze, corresponding to α and H1 from the
following table
Step 7: Make decision on H0 choosing the suitable rejection rule from the
following table corresponding to H1.
23
Example 1
A survey was conducted among the citizens of a city to study their preference
towards consumption of tea and coffee. Among 1000 randomly selected persons,
it is found that 560 are tea-drinkers and the remaining are coffee-drinkers. Can
we conclude at 1% level of significance from this information that both tea and
coffee are equally preferred among the citizens in the city?
Solution:
Step 1: Let P denote the proportion of people in the city who preferred to
consume tea.
Then, the null and the alternative hypotheses are
Null hypothesis: H 0: P = 0.5
i.e., it is significant that both tea and coffee are preferred equally in the city.
Alternative hypothesis: H 1: P ≠ 0.5
i.e., preference of tea and coffee are not significantly equal. It is a two-sided
alternative hypothesis.
Step 2: Data
The given sample information are:
Sample size (n) = 1000. Hence, it is a large sample.
No. of tea-drinkers = 560
Sample proportion (p) = 560/1000 = 0.56
Step 3: Level of significance
α= 1%
Step 4: Test statistic
Since n is large, np = 560 > 5 and n(1 – p) = 440 > 5, the test statistic under the
null hypothesis, is Z = .
Its sampling distribution under H0 is the N(0,1) distribution.
Step 5: Calculation of Test Statistic
The value of Z can be calculated for the sample information from
Thus, z0 = 3.79
Step 6: Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at 1% level of
significance is zα/2 = z0.005 = 2.58.
24
Step 7: Decision
Since H1 is a two-sided alternative, elements of the critical region are determined
by the rejection rule |z0| ≥ z e. Thus, it is a two-tailed test. Since |z0| = 3.79 > ze =
2.58, reject H0 at a 1% level of significance. Therefore, there is significant
evidence to conclude that the preference of tea and coffee are different.
Step 7: Decide on H0 choosing the suitable rejection rule from the following
table corresponding to H1.
25
Example 2
A study was conducted to investigate the interest of people living in cities towards
self-employment. Among randomly selected 500 persons from City-1, 400
persons were found to be self-employed. From City -2, 800 persons were selected
randomly and among them 600 persons are self-employed. Do the data indicate
that the two cities are significantly different concerning the prevalence of self-
employment among the persons? Choose the level of significance as α = 0.05.
Solution:
Step1: Let PX and PY be respectively the proportions of self-employed people in
City-1 and City-2. Then, the null and alternative hypotheses are
Null hypothesis: H0: PX = PY
i.e., there is no significant difference between the proportions of self-
employed people in City-1 and City-2.
Alternative hypothesis: H1: PX ≠ PY
i.e., difference between the proportions of self-employed people in City-1 and
City-2 is significant. It is a two-sided alternative hypothesis.
Step 2: Data
The given sample information are
Here, m ≥ 30, n ≥ 30, mpX = 400 > 5, m(1− pX) = 100 > 5, npY = 600 > 5
and n(1− pY) = 200 > 5.
Step 3: Level of significance
α= 5%
Step 4: Test statistic
The test statistic under the null hypothesis is
26
z0 = 2.0764
Step 6: Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at a 5% level of
significance is ze = 1.96.
Step 7: Decision
Since H0 is a two-sided alternative, elements of the critical region are determined
by the rejection rule |z0| > ze. Thus, it is a two-tailed test. For the given sample
information, ze = 2.0764 > ze = 1.96. Hence, H0 is rejected. We can conclude that
the difference between the proportions of self-employed people in City-1 and
City-2 is significant.
Practice questions
1. A company claims that the average waiting time for customers in their store
is less than 5 minutes. To test this claim, a random sample of 100 customers
is selected, and their waiting times are recorded. The sample mean waiting
time is found to be 4.6 minutes with a standard deviation of 1.2 minutes.
Conduct a hypothesis test at the 5% significance level to determine if there
is enough evidence to support the company's claim.
2. An educational program claims that it increases students' test scores. To
test this claim, a random sample of 200 students who participated in the
program is selected, and their test scores are recorded. The sample mean
test score is found to be 85 with a standard deviation of 10. Conduct a
hypothesis test at the 1% significance level to determine if there is enough
evidence to support the claim that the program increases test scores.
3. A manufacturer claims that the average weight of their cereal boxes is 500
grams. To test this claim, a random sample of 150 cereal boxes is selected,
and their weights are measured. The sample mean weight is found to be
490 grams with a standard deviation of 20 grams. Conduct a hypothesis
test at the 5% significance level to determine if there is enough evidence to
support the manufacturer's claim.
27
4. A researcher is investigating whether a new teaching method reduces
students' anxiety levels. A random sample of 10 students is selected, and
their anxiety levels are measured before and after implementing the new
method. The researcher wants to determine if there is evidence that the new
method reduces anxiety levels. The sample mean reduction in anxiety
levels is found to be 4 points with a standard deviation of 2 points. Conduct
a hypothesis test at the 5% significance level to determine if there is enough
evidence to support the claim that the new method reduces anxiety levels .
28