Sampling Methods 6 and 7 April
Sampling Methods 6 and 7 April
Let us try to understand the terms „census‟ and „sample‟ with the help of an
illustration. Suppose you wish to study the „impact of T.V. advertisements on
children in Delhi, then you have to collect relevant information from the children
residing in Delhi who view T.V. Alternatively, we can say this is the population
(statistical terminology) for your study. If you collect the data from all of them
not leaving a single child, it known as Census method of data collection. This
means studying the whole population. Otherwise, if you select only some children
from among them for gathering the desired information for the study,
because it is not feasible to gather the information from all the children, then it is
known as Sample for data collection. Therefore, a sample is a subset of a statistical
population whose characteristics are studied to know the information about the
whole population. When dealing with people, it can be defined as a set of
respondents (people) selected from a population for the purpose of a
survey. A population is a group of individual persons, objects, items or any other
units from which samples are taken for measurement.
A complete survey of population is called a census. It involves covering all
respondents, items, or units of the population. For example, if we want to know the
wage structure of the textile industry in the country, then one approach is to collect
the data on the wages of each and every worker in the
textile industry. On the other hand, a sample is a representative subset of
population. Thus in a sample survey we cover only a sample of respondents, items
or units of population we are interested in and then draw inferences about the
whole population.
The following are the advantages of census:
1) In a census each and every respondent of the population is considered and
various population parameters are compiled for information.
2) The information obtained on the basis of census data is more reliable and
accurate. It is an adopted method of collecting data on exceptional matters like
child labour, distribution by sex, educational level of the people etc.
3) If we are conducting a survey for the first time we can have a census instead of
sample survey. The information based on this census method becomes a base for
future studies. Similarly, some of the studies of special importance like population
data are obtained only through census.
WHY SAMPLING?
One of the decisions to be made by a researcher in conducting a survey is whether
to go for a census or a sample survey. We obtain a sample rather than a complete
enumeration (a census ) of the population for many reasons. The most important
considerations for this are: cost, size of the population, accuracy of data,
accessibility of population, timeliness, and destructive observations.
1) Cost: The cost of conducting surveys through census method would be
prohibitive and sampling helps in substantial cost reduction of surveys. Since most
often the financial resources available to conduct a survey are scarce, it is
imperative to go for a sample survey than census.
2) Size of the Population: If the size of the population is very large it is difficult to
conduct a census if not impossible. In such situations sample survey is the only
way to analyse the characteristics of a population.
3) Accuracy of Data: Although reliable information can be obtained through
census, sometime the accuracy of information may be lost because of a large
population. Sampling involves a small part of the population and a few trained
people can be involved to collect accurate data. On the other hand,
a lot of people are required to enumerate all the observations. Often it becomes
difficult to involve trained manpower in large numbers to collect the data thereby
compromising accuracy of data collected. In such a situation a sample may be
more accurate than a census. A sloppily conducted census can provide less reliable
information than a carefully obtained sample.
4) Accessibility of Population: There are some populations that are so difficult to
get access to that only a sample can be used, e.g., people in prison, birds migrating
from one place to another place etc. The inaccessibility may be economic or time
related. In a particular study, population may be so costly to reach, like the
population of planets, that only a sample can be used.
5) Timeliness: Since we are covering a small portion of a large population through
sampling, it is possible to collect the data in far less time than covering the entire
population. Not only does it take less time to collect the data through sampling but
the data processing and analysis also takes less
time because fewer observations need to be covered. Suppose a company wants to
get a quick feedback from its consumers on assessing their perceptions about a new
improved detergent in comparison to an existing version of the detergent. Here the
time factor is very significant. In such
situations it is better to go for a sample survey rather than census because it
reduces a lot of time and product launch decision can be taken quickly.
6) Destructive Observations: Sometimes the very act of observing the desired
characteristics of a unit of the population destroys it for the intended use. Good
examples of this occur in quality control. For example, to test the quality of a bulb,
to determine whether it is defective, it must be destroyed.
To obtain a census of the quality of a lorry load of bulbs, you have to destroy all of
them. This is contrary to the purpose served by quality-control testing. In this case,
only a sample should be used to assess the quality of the bulbs. Another example is
blood test of a patient. The disadvantages of sampling are few but the researcher
must be cautious. These are risk, lack of representativeness and insufficient sample
size each of which can cause errors. If researcher don‟t pay attention to these flaws
it may invalidate the results.
1) Risk: Using a sample from a population and drawing inferences about the entire
population involves risk. In other words the risk results from dealing with a part of
a population. If the risk is not acceptable in seeking a solution to a problem then a
census must be conducted.
2) Lack of representativeness: Determining the representativeness of the sample
is the researcher‟s greatest problem. By definition, „sample‟ means a representative
part of an entire population. It is necessary to obtain a sample that meets the
requirement of representativeness otherwise the sample will be biased. The
inferences drawn from nonreprentative samples will be misleading and potentially
dangerous.
3) Insufficient sample size: The other significant problem in sampling is to
determine the size of the sample. The size of the sample for a valid sample depends
on several factors such as extent of risk that the researcher is willing to accept and
the characteristics of the population itself.
4.4 ESSENTIALS OF A GOOD SAMPLE
It is important that the sampling results must reflect the characteristics of the
population. Therefore, while selecting the sample from the population under
investigation it should be ensured that the sample has the following characteristics:
1) A sample must represent a true picture of the population from which it is drawn.
2) A sample must be unbiased by the sampling procedure.
3) A sample must be taken at random so that every member of the population of
data has an equal chance of selection.
4) A sample must be sufficiently large but as economical as possible.
5) A sample must be accurate and complete. It should not leave any information
incomplete and should include all the respondents, units or items included in the
sample.
6) Adequate sample size must be taken considering the degree of precision
required in the results of inquiry.
Advantages
i) The simple random sample requires less knowledge about the characteristics of
the population.
ii) Since sample is selected at random giving each member of the population equal
chance of being selected the sample can be called as unbiased sample. Bias due to
human preferences and influences is eliminated.
iii) Assessment of the accuracy of the results is possible by sample error
estimation.
iv) It is a simple and practical sampling method provided population size is not
large.
Limitations
i) If the population size is large, a great deal of time must be spent listing and
numbering the members of the population.
ii) A simple random sample will not adequately represent many population
characteristics unless the sample is very large. That is, if the researcher is
interested in choosing a sample on the basis of the distribution in the population of
gender, age, social status, a simple random sample needs to be very large to ensure
all these distributions are representative of the population. To obtain a
representative sample across multiple population attributes we should use stratified
random sampling.
2. Systematic Sampling: In systematic sampling the sample units are selected
from the population at equal intervals in terms of time, space or order. The
selection of a sample using systematic sampling method is very simple. From a
population of „N‟ units, a sample of „n‟ units may be selected by following the
steps given below:
i) Arrange all the units in the population in an order by giving serial numbers from
1 to N.
ii) Determine the sampling interval by dividing the population by the sample size.
That is, K=N/n.
iii) Select the first sample unit at random from the first sampling interval (1 toK).
iv) Select the subsequent sample units at equal regular intervals.
For example, we want to have a sample of 100 units from a population of 1000
units. First arrange the population units in some serial order by giving numbers
from 1 to 1000. The sample interval size is K=1000/100=10. Select the first sample
unit at random from the first 10 units ( i.e. from 1 to 10). Suppose the first sample
unit selected is 5, then the subsequent sample units are 15, 25,35,.........995. Thus,
in the systematic sampling the first sample unit is selected at random and this
sample unit in turn determines the subsequent sample units that are to be selected.
Advantages
i) The main advantage of using systematic sample is that it is more expeditious to
collect a sample systematically since the time taken and work involved is less than
in simple random sampling. For example, it is frequently used in exit polls and
store consumers.
ii) This method can be used even when no formal list of the population units is
available. For example, suppose if we are interested in knowing the opinion of
consumers on improving the services offered by a store we may simply choose
every kth (say 6th) consumer visiting a store provided that we know how many
consumers are visiting the store daily (say 1000 consumers visit and we want to
have 100 consumers as sample size).
Limitations
i) If there is periodicity in the occurrence of elements of a population, the selection
of sample using systematic sample could give a highly un-representative
sample.For example, suppose the sales of a consumer store are arranged
chronologically and using systematic sampling we select sample for 1st of every
month. The 1st day of a month can not be a representative sample for the whole
month. Thus in systematic sampling there is a danger of order bias.
ii) Every unit of the population does not have an equal chance of being selected
and the selection of units for the sample depends on the initial unit selection.
Regardless how we select the first unit of sample, subsequent units are
automatically determined lacking complete randomness.
3. Stratified Random Sampling: The stratified sampling method is used when the
population is heterogeneous rather than homogeneous. A heterogeneous population
is composed of unlike elements such as male/female, rural/urban, literate/illiterate,
high income/low income groups, etc. In such cases, use of simple random sampling
may not always provide a representative sample of the
population. In stratified sampling, we divide the population into relatively
homogenous groups called strata. Then we select a sample using simple random
sampling from each stratum. There are two approaches to decide the sample size
from each stratum, namely, proportional stratified sample and
disproportional stratified sample. With either approach, the stratified sampling
guarantees that every unit in the population has a chance of being selected. We will
now discuss these two approaches of selecting samples.
i) Proportional Stratified Sample: If the number of sampling units drawn from
each stratum is in proportion to the corresponding stratum population size, we say
the sample is proportional stratified sample. For example, let us say we want to
draw a stratified random sample from a heterogeneous population (on some
characteristics) consisting of rural/urban and male/female respondents.
So we have to create 4 homogeneous sub groups called stratums as follows:
To
ensure each stratum in the sample will represent the corresponding stratum in the
population we must ensure each stratum in the sample is represented in the same
proportion to the stratums as they are in the population. Let us assume that we
know (or can estimate) the population distribution as follows:
65% male, 35% female and 30% urban and 70% rural. Now we can determine the
approximate proportions of our 4 stratums in the population as shown below.
The above figures are, normally, estimated on the basis of previous knowledge of a
researcher.
Then the allocation of sample size of 1000 for each strata using disproportional
stratified sampling method will be as shown in the following table:
Advantages
a) Since the sample are drawn from each of the stratums of the population,
stratified sampling is more representative and thus more accurately reflects
characteristics of the population from which they are chosen.
b) It is more precise and to a great extent avoids bias.
c) Since sample size can be less in this method, it saves a lot of time, money and
other resources for data collection.
Limitations
a) Stratified sampling requires a detailed knowledge of the distribution of attributes
or characteristics of interest in the population to determine the homogeneous
groups that lie within it. If we cannot accurately identify the homogeneous groups,
it is better to use simple random sample since improper stratification can lead to
serious errors.
b) Preparing a stratified list is a difficult task as the lists may not be readily
available.
4. Cluster Sampling: In cluster sampling we divide the population into groups
having heterogenous characteristics called clusters and then select a sample of
clusters using simple random sampling. We assume that each of the clusters is
representative of the population as a whole. This sampling is widely used for
geographical studies of many issues. For example if we are interested in finding
The consumers‟ (residing in Delhi) attitudes towards a new product of accompany,
the whole city of Delhi can be divided into 20 blocks. We assume that each of
these blocks will represent the attitudes of consumers of Delhi as a whole, we
might use cluster sampling treating each block as a cluster. We will then select a
sample of 2 or 3 clusters and obtain the information from consumers covering all
of them. The principles that are basic to the cluster sampling are as follows:
i) The differences or variability within a cluster should be as large as possible. As
far as possible the variability within each cluster should be the same as that of the
population.
ii) The variability between clusters should be as small as possible. Once the
clusters are selected, all the units in the selected clusters are covered for obtaining
data.
Advantages
a) The cluster sampling provides significant gains in data collection costs, since
traveling costs are smaller.
b) Since the researcher need not cover all the clusters and only a sample of clusters
are covered, it becomes a more practical method which facilitates fieldwork.
Limitations
a) The cluster sampling method is less precise than sampling of units from the
whole population since the latter is expected to provide a better cross-section of the
population than the former, due to the usual tendency of units in a cluster to be
homogeneous.
b) The sampling efficiency of cluster sampling is likely to decrease with the
decrease in cluster size or increase in number of clusters. The above advantages or
limitations of cluster sampling suggest that, in practical situations where sampling
efficiency is less important but the cost is of greater
significance, the cluster sampling method is extensively used. If the division of
clusters is based on the geographic sub-divisions, it is known as area sampling.
In cluster sampling instead of covering all the units in each cluster we can resort to
sub-sampling as two-stage sampling. Here, the clusters are termed as primary units
and the units within the selected clusters are taken as secondary units.
5. Multistage Sampling: We have already covered two stage sampling. Multi
stage sampling is a generalisation of two stage sampling. As the name suggests,
multi stage sampling is carried out in different stages. In each stage progressively
smaller (population) geographic areas will be randomly selected.
A political pollster interested in assembly elections in Uttar Pradesh may first
divide the state into different assembly units and a sample of assembly
constituencies may be selected in the first stage. In the second stage, each of the
sampled assembly constituents are divided into a number of segments and a second
stage sampled assembly segments may be selected. In the third stage within each
sampled assembly segment either all the house-holds or a sample random of
households would be interviewed. In this sampling method, it is possible to take as
many stages as are necessary to achieve a representative sample. Each stage results
in a reduction of sample size.
In a multi stage sampling at each stage of sampling a suitable method of sampling
is used. More number of stages are used to arrive at a sample of desired sampling
units.
Advantages
a) Multistage sampling provides cost gains by reducing the data collection on
costs.
b) Multistage sampling is more flexible and allows us to use different sampling
procedures in different stages of sampling.
c) If the population is spread over a very wide geographical area, multistage
sampling is the only sampling method available in a number of practical situations.
Limitations
a) If the sampling units selected at different stages are not representative multistage
sampling becomes less precise and efficient.