QT-UNIT IV
QT-UNIT IV
TESTING OF HYPOTHESIS
SAMPLING THEORIES
It is the practice of selecting an individual group from a population to study the whole
population.
Let’s say we want to know the percentage of people who use iPhones in a city, for
example. One way to do this is to call up everyone in the city and ask them what type
of phone they use. The other way would be to get a smaller subgroup of individuals
and ask them the same question, and then use this information as an approximation of
the total population.
However, this process is not as simple as it sounds. Whenever you follow this
method, your sample size has to be ideal - it should not be too large or too small. Then
once you have decided on the size of your sample, you must use the right type of
sampling techniques to collect a sample from the population. Ultimately, every
sampling type comes under two broad categories:
Probability sampling - Random selection techniques are used to select the sample.
Now, let’s discuss the types of sampling in data analytics. First, let us start with the
Probability Sampling techniques.
1
1. Simple Random Sampling
In simple random sampling, the researcher selects the participants randomly. There
are a number of data analytical tools like random number generators and random
number tables used that are based entirely on chance.
2. Systematic Sampling
Example: The researcher assigns every member in the company database a number.
Instead of randomly generating numbers, a random starting point (say 5) is selected.
From that number onwards, the researcher selects every, say, 10th person on the list
(5, 15, 25, and so on) until the sample is obtained.
3. Stratified Sampling
In stratified sampling, the population is subdivided into subgroups, called strata, based
on some characteristics (age, gender, income, etc.). After forming a subgroup, you can
then use random or systematic sampling to select a sample for each subgroup. This
method allows you to draw more precise conclusions because it ensures that every
subgroup is properly represented.
Example: If a company has 500 male employees and 100 female employees, the
researcher wants to ensure that the sample reflects the gender as well. So the
population is divided into two subgroups based on gender.
2
4. Cluster Sampling
In cluster sampling, the population is divided into subgroups, but each subgroup has
similar characteristics to the whole sample. Instead of selecting a sample from each
subgroup, you randomly select an entire subgroup. This method is helpful when
dealing with large and diverse populations.
Example: A company has over a hundred offices in ten cities across the world which
has roughly the same number of employees in similar job roles. The researcher
randomly selects 2 to 3 offices and uses them as the sample.
Here comes the next type of sampling techniques i.e., Non-Probability Sampling
Techniques
1. Convenience Sampling
In this sampling method, the researcher simply selects the individuals which are most
easily accessible to them. This is an easy way to gather data, but there is no way to tell
if the sample is representative of the entire population. The only criteria involved is
that people are available and willing to participate.
Example: The researcher stands outside a company and asks the employees coming in
to answer questions or complete a survey.
3
2. Voluntary Response Sampling
Voluntary response sampling is similar to convenience sampling, in the sense that the
only criterion is people are willing to participate. However, instead of the researcher
choosing the participants, the participants volunteer themselves.
Example: The researcher sends out a survey to every employee in a company and
gives them the option to take part in it.
3. Purposive Sampling
In purposive sampling, the researcher uses their expertise and judgment to select a
sample that they think is the best fit. It is often used when the population is very small
and the researcher only wants to gain knowledge about a specific phenomenon rather
than make statistical inferences.
Example: The researcher wants to know about the experiences of disabled employees
at a company. So the sample is purposefully selected from this population.
4. Snowball Sampling
In snowball sampling, the research participants recruit other participants for the study.
It is used when participants required for the research are hard to find. It is called
snowball sampling because like a snowball, it picks up more participants along the
way and gets larger and larger.
Example: The researcher wants to know about the experiences of homeless people in
a city. Since there is no detailed list of homeless people, a probability sample is not
possible. The only way to get the sample is to get in touch with one homeless person
who will then put you in touch with other homeless people in a particular area.
4
Statistical Inference:
Statistical inference refers to the process of selecting and using a sample
statistic to draw conclusions about the population parameter. Statistical inference
deals with two types of problems.
They are:-
1. Testing of Hypothesis
2. Estimation
Hypothesis:
Hypothesis is a statement subject to verification. More precisely, it is a
quantitative statement about a population, the validity of which remains to be
tested. In other words, hypothesis is an assumption made about a population
parameter.
Testing of Hypothesis:
Testing of hypothesis is a process of examining whether the hypothesis
formulated by the researcher is valid or not. The main objective of hypothesis
testing is whether to accept or reject the hypothesis.
Procedure for Testing of Hypothesis:
The various steps in testing of hypothesis involves the following :-
1. Set Up a Hypothesis:
The first step in testing of hypothesis is to set p a hypothesis about
population parameter. Normally, the researcher has to fix two types of
hypothesis. They are null hypothesis and alternative hypothesis.
Null Hypothesis:-
Null hypothesis is the original hypothesis. It states that there is no
significant difference between the sample and population regarding a
particular matter under consideration. The word “null” means ‘invalid’ of ‘void’
or ‘amounting to nothing’. Null hypothesis is denoted by Ho. For example,
suppose we want to test whether a medicine is effective in curing cancer. Hence,
the null hypothesis will be stated as follows:-
H0: The medicine is not effective in curing cancer (i.e., there is no
significant difference between the given medicine and other
medicines in curing cancer disease.)
Alternative Hypothesis:-
Any hypothesis other than null hypothesis is called alternative hypothesis.
When a null hypothesis is rejected, we accept the other hypothesis,
known as alternative hypothesis. Alternative hypothesis is denoted by
H1. In the above example, the alternative hypothesis may be stated as
follows:-
5
H1: The medicine is effective in curing cancer. (i.e., there is
significant difference between the given medicine and other medicines
in curing cancer disease.)
6
statistic. Standard error plays a very important role in the large sample theory.
The following are the important uses of standard errors:-
1. Standard Error is used for testing a given hypothesis
2. S.E. gives an idea about the reliability of a sample, because the
reciprocal of S.E. is a measure of reliability of the sample.
3. S.E. can be used to determine the confidence limits within which
the population parameters are expected to lie.
Test Statistic
The decision to accept or to reject a null hypothesis is made on the
basis of a statistic computed from the sample. Such a statistic is called the test
statistic. There are different types of test statistics. All these test statistics can be
classified into two groups. They are
a. Parametric Tests
b. Non-Parametric Tests
PARAMETRIC TESTS
The statistical tests based on the assumption that population or population
parameter is normally distributed are called parametric tests. The important
parametric tests are:-
1. z-test
2. t-test
3. f-test
Z-test:
Z-test is applied when the test statistic follows normal distribution.
It was developed by Prof.R.A.Fisher. The following are the important uses of z-
test:-
1. To test the population mean when the sample is large or when
the population standard deviation is known.
2. To test the equality of two sample means when the samples are
large or when the population standard deviation is known.
3. To test the population proportion.
4. To test the equality of two sample proportions.
5. To test the population standard deviation when the sample is large.
6. To test the equality of two sample standard deviations when the samples
are large or when population standard deviations are known.
7
7. To test the equality of correlation coefficients.
While calculating F-ratio, the numerator is the greater variance and denominator is
the smaller variance. So,
F=
Great
er
Varia
nce
Smal
ler
Varia
nce
Uses of F-distribution:-
1. To test the equality of variances of two populations.
2. To test the equality of means of three or more populations.
3. To test the linearity of regression
Assumptions of F-distribution:-
1. The values in each group are normally distributed.
2. The2variance within each group should be equal for all groups. (σ2 = σ2
= σ …)
1 2 3
9
3. Rejecting a null hypothesis when it is true.
4. Accepting a null hypothesis when it is false.
Out of the above 4 possibilities, 1 and 2 are correct, while 3 and 4 are
errors. The error included in the above 3 rd possibility is called type I error and
that in the 4th possibility is called type II error.
Type I Error
The error committed by rejecting a null hypothesis when it is true, is
called Type I error.
The probability of committing Type I error is denoted by α (alpha).
Type II Error
The error committed by accepting a null hypothesis when it is false is called
Type II error.
The probability of committing Type II error is denoted by β (beta).
β = Prob. (Type II error)
β = Prob. (Accepting H0 when it is false)
Degree of freedom
Degree of freedom is defined as the number of independent observations
which is obtained by subtracting the number of constraints from the total number
of observations.
Degree of freedom = Total number of observations – Number of constraints.
= 1- α.
If the calculated value of the test statistic falls in the acceptance region,
we accept the null hypothesis
NON-PARAMETRIC TESTS
Uses of 2 - test
11
The uses of chi-square test are:-
1. Useful for the test of goodness of fit:- χ2 - test can be used to test
whether there is goodness of fit between the observed frequencies and
expected frequencies.
2. Useful for the test of independence of attributes:- χ2 test can be used to
test whether two attributes are associated or not.
3. Useful for the test of homogeneity:- χ2 -test is very useful t5o test
whether two attributes are homogeneous or not.
4. Useful for testing given population variance:- χ2-test can be used for
testing whether the given population variance is acceptable on the basis of
samples drawn from that population.
1. Set up mull hypothesis that there is goodness of fit between observed and
expected frequencies.
2
χ =
Σ
Where O =
Observed
frequenci
es E =
Expected
frequenci
es
Qn:- A sample analysis of examination result of 200 students were made. It was
found that 46 students had failed, 68 secured III rd class, 62 IInd class and
the rest were placed in the I st class. Are these figures commensurate with
the general examination results which is in the ratio of 2 : 3: 3: 2 for
various categories respectively?
Sol: H0: The figures commensurate with the general examination results.
H1: The figures do not commensurate with the general examination results.
2
χ =
Σ
Computation of 2 value:
(O — E)2
O E O-E (O — E)2
E
46 200 x 2
= 40 6 36 0.9000
10
68 200 x 2
= 40 8 64 1.0667
10
62 200 x 2
= 40 2 4 0.0667
10
24 200 x 2
= 40 256 6.4000
10 -16
(O–E)2 = 8.4334
Σ E
χ2 = 8. 4334
The table value at 5% level of significance
and degree of freedom at 3. = 7. 815
13
(df = n – r- 1 =4 – 0 – 1 = 3)
m we reject the H0
m we conclude that the analytical figures do not commensurate with the
general examination result. In other words, there is no goodness of fit between
the observed and expected frequencies.
Qn: Test whether the accidents occur uniformity over week days on the
basis of the following information:-
Days of the week: Sun Mon Tue Wed Thu Fri Sat
No. of accidents: 11 13 14 13 15 14 18
χ2 = Σ
Computation of 2 value:
(O — E)2
O E O–E (O — E)2
E
11 14 -3 9 0.6429
13 14 -1 1 0.0714
14 14 0 0 0.0000
13 14 1 0.0714
-1
15 14 1 0.0714
1
14 14 0 0.0000
0
18 14 16 1.1429
4
= 2.0000
14
( )2
Σ O–E
E
15