01 Introduction to Sampling Techniques
01 Introduction to Sampling Techniques
❖ Population:
Population is an aggregate of all units about which we are interested according to some
predetermined objective and are available in a specified area at a specified time. For example, if
the predetermined objective is to investigate the number of children per family in Lalmonirhat
district during the time period 2024, then each family residing in Lalmonirhat district during that
time is a unit and total families living in this district constitute the population.
❖ Finite population:
If a population has definite number of units, then it is called finite population. For example, if the
predetermined objective is to estimate the total number of milking cows of our country during
the period 2024, then each milking cow of our country during that period is a unit and all the
milking cows of this country during this period constitute a finite population because the number
of units is finite.
❖ Infinite population:
If a population has indefinite or uncountable number of units, then it is called infinite population.
For example, if the predetermined objective is to estimate the number of customers entered in a
shopping center in every two hours, then each customer who can enter in the shopping center in
that time period is a unit and all the customers who can enter in the shopping center in that time
period constitute an infinite population because the number of customers is uncountable or
infinite.
❖ Target population:
The entire group about which information is desired and conclusion is to be made, is called a
target population.
❖ Sample:
Sample is a representative part of the population. For example, let us suppose that a tire
manufacturer produced a new tire to provide an increase in mileage over the firm’s current tires.
To estimate the mean number of miles provided by the new tires, the manufacturer selected a
sample of 120 tires for testing. The test results provided a sample mean of 36,500 miles. Hence,
an estimate of the mean tire mileage for the population of new tires was 36,500 miles.
❖ Random sample:
Any sample selected by chance mechanism with known chance of selection is called a random
sample. The chances of selection need not be equal for all samples. A random sample is free
from selection bias.
❖ Sampling technique:
Sampling technique is a scientific process of selecting a sample from a population.
1
01 Introduction to sampling techniques
❖ Sampling unit:
A sampling unit is a well-defined, distinct, and identifiable element or group of elements on
which observations can be made. For a household survey, the housing apartments or families
may constitute the sampling units.
❖ Unit of inquiry:
A unit of inquiry is the unit about which information is required. It may or may not be the same
as the sampling unit. We may select a sample of households and obtain information about
household members. Here households serve as sampling unit and the household members serve
as the unit of inquiry.
❖ Sampling frame:
A sampling frame is a complete list of all units or group of units of the population to be sampled,
organized, and arranged in such a manner that every unit occurs once and only once in the list
and no unit is excluded from the list. In sampling problems, we encounter two types of sampling
frames such as area sampling frames and list sampling fames.
Area sampling frames are usually used to sample geographical areas. With this technique, each
element of the population is associated with a particular geographical area constituted by a group
of people or households. In this case, a sample of area is drawn and either all elements or a part
of them in the selected areas are included in the survey.
List sampling frame is a complete list of well-defined reporting units. The list should contain
relevant information about individual units, which will enable efficient sampling.
In such sampling, the selection of population elements is not made through any probability
mechanism and because of this, the investigator cannot claim that his or her sample is
representative of the population. This greatly limits the investigator’s ability to generalize the
findings beyond the specific sample studied.
2
01 Introduction to sampling techniques
❖ Convenience sampling:
Non probability samplings that are unrestricted are known as convenience samplings.
Researchers have the freedom to choose whomever they find, thus the name convenience. The
convenience sample may consist of respondents living in an easily accessible locality.
Undoubtedly, it is the simplest and less reliable form of non probability sampling. The primary
virtue is its low cost.
While a convenience sample has no control to ensure precision, this method is quite frequently
used, especially in market research and public opinion surveys. This method is used because
probability sampling is often a time consuming and expensive procedure and in fact, may not be
feasible in many situations. In the early stages of exploratory research, when one is seeking
guidance, convenience sampling is recommended.
❖ Accidental sampling:
An accidental type of sampling is one in which the selection of the cases is made whatever
happens to be available instantly. In such sampling, individuals are selected as they appear in a
process. If it is decided that only diabetic patients will be chosen from a queue in front of a
hospital counter, the resulting sample will lead to an accidental sampling procedure.
❖ Purposive sampling:
A non-probability sampling method that conforms to certain criteria is called purposive
sampling. There are two major types of purposive sampling, which are judgment sampling and
quota sampling.
❖ Judgment sampling:
Judgment sampling or expert choice is one in which the cases are included for investigation
through a planned selection procedure. In this case, individuals are selected who are considered
to be most representative of the population as a whole. It is called a judgment sampling because
choice of the individual units depends entirely on the sampler, who, on his own judgment,
decides the sample to be selected that conform to some criteria.
In a study of labor problem, one may decide to talk only with those who have experienced
discrimination while they were in job. Election results are predicted from only a few selected
persons because of their predictive records in past elections.
❖ Quota sampling:
Quota sampling is a non probability sampling, equivalent to a stratified sampling, in which the
interviewers are told to contact and interview a certain number of individuals from certain sub
groups or strata of the population to make up the total sample.
In this method, individuals are not pre selected at all, but once the strata are formed (usually
based on sex, age, social status, region of residence, etc.), general breakdown of the sample is
decided (that is, how many persons in each sex category, how many persons in each age group or
3
01 Introduction to sampling techniques
how many persons in each social class is to include) and quota assignments are allocated to the
interviewers, selection of the individuals within the strata is left to the interviewers with whom
they are to conduct interviews. The factors (sex, age, social status, region of residence, etc.),
which are used to form strata, are termed as quota control.
This technique is widely used by market researchers, political opinion seekers and many others
to avoid the cost problems of interviewing a pre-selected sample of individuals.
The term quota arises from the fact that in this method, the interviewers are given quotas of
certain sub-groups (strata) of the population at the very outset to build a sample roughly
proportional to the population. That is, quotas of desired number of sample cases are computed
proportionally to the population sub groups. The sample quotas are divided among the
interviewers, who then do their best to choose persons who fit the restrictions of their quota
controls.
For example, if it is known that one-third of the population lives in urban areas and two-thirds in
rural areas, the sample can be selected purposively from urban and rural areas in the same
proportion. Thus, a total of 300 respondents would mean 100 urban residents and 200 rural
residents to be included in the sample.
Note that quota sampling may be considered equivalent to stratified sampling with the added
requirement that stratum is generally represented in the sample in the same proportion as in the
entire population.
The essential difference between a probability sample and quota sample is that with the former,
interviewers are required to interview specified (pre-selected) persons selected by a probability
mechanism, while with the later they have to complete their quotas in a way they desire.
It is a common practice, although not necessarily mandatory, for quota samples, to adopt random
selection at the initial stages of selection in exactly the same way as probability samples. Then,
an additional difference between a probability sample and quota sample lies in the selection of
the final sampling units, say individuals.
4
01 Introduction to sampling techniques
❖ Snowball sampling:
Snowball sampling is non probability sampling in which persons initially chosen for the sample
are used as informants to locate other persons having necessary characteristics making them
eligible for the sample through referral network.
It is the colorful name for technique of building up a list or sample of a special population. Some
recent authors have referred to snowball sampling as chain referral sampling. It has achieved
increased used in recent years in situations where respondents are difficult to identify and are
best located by using an initial set of its members or informants through referral network
approach.
For example, consider the selection of beggars for which no frame is available. This can be best
done by asking an initial group of beggars to supply the names of other beggars they come
across. Selection of mosque Imams or the sex workers also can be made following this network
approach, since members of this population may well know each other particularly in small
areas.
Although snowball sampling is generally considered non probability sampling, strategies have
been developed to draw snowball sampling through probabilistic approach which allows
compilations of sampling errors and use of statistical test of significance. If one wishes the
snowball sample to be probabilistic, one should sample randomly within each stage.
Snowball sampling, whether probabilistic or non probabilistic, is conducted in stages. In the first
stage, a few persons possessing the requisite characteristic are identified and interviewed. These
persons are used as informants to identify others who qualify for inclusion in the sample. The
second stage involves interviewing these persons and so on.
The term snowball stems from the analogy of a snowball, which begins small but becomes
bigger and bigger as it falls downhill. Snowball sampling has been particularly used to study
drug cultures, heroin addiction, teenage gang activities and other issues where respondents may
not be readily visible or are difficult to identify and contact.
5
01 Introduction to sampling techniques
❖ Probability sampling:
Probability sampling is the scientific method of selecting samples according to some laws of
chance in which each unit in the population has some definite pre-assigned probability of being
selected in the sample.
As a result, selection biases are possible to be avoided and statistical theory can be applied to
derive the properties of the estimators. A probability sample is so designed that statistical
inference about the population can be based on the measures of variability computed from the
sample data. In addition, probability sampling allows us to construct a confidence interval within
which the true value of the population parameter is expected to lie.
A good number of probability sampling designs are in use. Among the most widely used are
simple random sampling, stratified sampling, systematic sampling, cluster sampling, multi-stage
sampling, multi-phase sampling, probability proportional to size sampling, etc.
6
01 Introduction to sampling techniques
Suppose we want to select n candidates out of N . We assign the numbers 1 to N , one number to
each candidate and write these numbers on N slips, which are made as homogeneous as possible
in shape, size, color, etc. These slips are then put in a bag and thoroughly shuffled and then n
slips are drawn one by one. The n candidates corresponding to numbers on the slips drawn will
constitute a random sample.
This method of selection is quite independent of the properties of population. Generally in place
of slips, cards are used. We make one card correspond to one of the units of the population by
writing on it the number of the unit. The pack of card is a kind of miniature of the population for
sampling purposes. The cards are shuffled a number of times and then a card is drawn at random
from them. This is one of the most reliable methods of selecting a random sample.
Theoretically, the lottery method is free from human bias and thus ensures randomness.
However, the randomness of the lottery method depends on the assumption that the identifiers
(marble, disk or piece of paper) are thoroughly mixed so that the population can be regarded as
being arranged randomly. In practice, such satisfactory mixing is difficult to ensure and thus the
use of random numbers remains the only option for selecting sample.
The units in the population are numbered from 1 to N . A series of random numbers between 1 to
N is then drawn by means of the random number table one after another. Once the first random
number is drawn, we may decide to proceed in any direction, vertically, horizontally, diagonally
or any other systematic way to obtain the remaining units in the sample.
At any draw, the process used must give equal chance of selection to any number between 1 to
N in the population. The units that bear these n numbers constitute our desired sample and we
7
01 Introduction to sampling techniques
technically call these n numbers a sample of size n . It is important to keep in mind that
whatever procedure is used, we must ensure that the numbers so selected are all different and
none are greater than the population size N .
The use of random number table involves a number of rejections since all numbers greater than
N appearing in the table are not considered for selection. The use of random numbers is,
therefore, modified and some of these modified procedures are: remainder and quotient method.
❖ Remainder method:
Suppose that a simple random sample of fixed size is to be drawn from a population comprising
N units. Let this N be a r digit number and let the highest r digit multiple of N is N / . A
random number k is chosen from 1 to N / and the unit with the serial number equal to the
remainder obtained on dividing k by N is selected. The second and the subsequent units are
selected in a similar manner. If the remainder is zero, the last unit is selected.
For example, suppose that a random sample of size 5 is to be selected from a population of size
150 units. 150 is a 3 digit number and the highest 3 digit multiple of 150 is: 150 6 = 900 . A
random number 277 is chosen from 001 to 900. Divide 277 by 150. The remainder is 127. The
unit labeled 127 in the population is selected.
To select the second unit, choose the next random number. This number is 130, which is less
than 150. We directly choose this number as our second unit in the sample. The next random
number is 802, which results in a remainder 52. The unit corresponding to this number is our
third selected unit.
Continuing this process, we arrive at the next two numbers. These are 108 and 91. So, the
random numbers thus chosen are 52, 91, 108, 127 and 130. Had there been any number larger
than 900, we would have ignored it. Note that the selection did not lead to any rejection of the
random numbers. That is, all the first 5 random numbers had been possible to be included in the
sample without any rejection.
❖ Quotient method:
Suppose that a simple random sample of fixed size is to be drawn from a population comprising
N units. Let this N be a r digit number and let the highest r digit multiple of N is N / such that
N/
N
= q. A random number k is chosen from 1 to ( N / − 1) and the unit with the serial number
equal to the ( quotient − 1) obtained on dividing k by q is selected. The second and the subsequent
units are selected in a similar manner.
For example, suppose that a random sample of size 2 is to be selected from a population of size
16 units. 16 is a 2 digit number and the highest 2 digit multiple of 16 is: 16 6 = 96 such that
96
= 6 . A random number 65 is chosen from 01 to 95. Divide 65 by 6. The quotient is 10. The
16
unit labeled 9 in the population is selected. The second unit is selected in a similar manner.
8
01 Introduction to sampling techniques
Stratified sampling is a sampling plan in which the population is divided into several non-
overlapping sub-populations or groups (strata) in such a way that units within themselves (within
strata) are homogeneous but between themselves (between strata) they are heterogeneous and
select a random sample independently from each stratum.
Strata are generally formed on the basis of some known characteristics of the population, which
is believed to be related to the variable of interest. This variable is known as auxiliary variable or
stratification variable or stratification factor.
For example, in studying the living and working conditions of the people, different types of area
(that is, City Corporation, municipal, urban, semi-urban, rural, etc.) may serve as stratification
variable, since this variable is believed to be related to the living and working conditions of the
people.
Consider a population consisting of N units. For stratified sampling, the population of N units is
k
first divided into k distinct classes or groups N1, N2 , . . . , Nk such that Ni = N . These sub-
i =1
populations are our strata. When the strata have been identified, then the samples of sizes
k
n 1, n 2 , . . . , n k are drawn from each strata of sizes N1, N2 , . . . , Nk , respectively such that n i = n .
i =1
❖ Systematic sampling:
Systematic sampling consists of selecting only the first unit at random, the rest being
automatically selected according to some predetermined pattern involving regular spacing of
units. Suppose that a sample of n units is to be selected from a population of N units.
Let these units be numbered from 1 to N in some order. Let N = nk , where k is an integer,
called sampling interval. To select a sample of n units, choose a unit at random from the first k
units and every k th units thereafter. Thus, if a unit randomly selected happens to be numbered r
and the predetermined sampling interval is k , the sample will consist of units bearing numbers:
r , r + k , r + 2k , . . . , r + ( n − 1) k
For example, suppose that a population consists of 15 elements, numbered serially from 01 to 15
and that a random sample of 3 units is desired. To achieve this, select at random one of the first 5
units, 01 to 05 and then every 5th unit in the sequence. If the first unit is 03, then the sample will
consist of units 03, 08 and 13. If the first unit is 01, then the sample will consist of units 01, 06
and 11. This procedure termed as linear systematic sampling.
If N nk , then a systematic sample will contain either n or n − 1 units depending on the serial
number of the first selected unit. Such a sample is called a non-linear systematic sampling.
❖ Cluster sampling:
In random sampling, the population can be divided into a finite number of distinct and
identifiable units defined as sampling units. The smallest units into which the population can be
divided are called the elementary units or elements of the population. The groups of such
9
01 Introduction to sampling techniques
elementary units or elements, which are internally heterogeneous and externally homogeneous
with respect to the study variables, are known as clusters.
If we treat these clusters as sampling units and select only a sample of them and if all the
elementary units or elements in the selected clusters are included in the sample, then the method
is known as single-stage cluster sampling or simply, cluster sampling.
1) Objectives of the study: Whenever we plan a sample survey, a clear and concise
statement of the objectives should be laid down. The objectives must be kept simple
enough to be understood by those working on the survey and to be met successfully
when the survey is completed.
2) Target population: The population from which sample is to be drawn should be defined
and identified in clear and unambiguous terms. The target population may be modified to
survey population to take account of practical constraints.
3) Data: The data to be collected must be relevant and pertinent to the purpose of the
survey. Keeping the objectives in view, a detailed list of variables should be prepared,
defined and how these variables will be measured, should be indicated in advance.
4) Precision desired: In a sample survey, only a part of the population is measured for
which the survey results are almost always subject to error. Error of measurement is also
an additional source of distorting the survey results. These errors can be reduced to some
extent by using larger sample and improved measuring instruments. But this involves
additional cost, time and effort. Consequently, a decision on the degree of precision
desired in the result must be specified.
6) Duration of the study: Once the date of execution of the study is decided, it remains to
set up a work schedule for the completion of the various stages of the study.
10
01 Introduction to sampling techniques
8) Survey design: Survey design is the process of preparing a complete plan of operations
to be followed in conducting a survey and disseminating its intended results. Specially, it
includes, among others, decisions on such factors as variables to be included in the
survey (called survey variables), the method of data collection (whether by self
administered questionnaire, interview schedule, telephonic conversation or direct
interview), construction of questionnaire, organizing fieldwork, data processing and data
analysis.
It seems obvious that the survey objectives covered under survey design
determine the sample design and in practice the sample design must be developed as an
integral part of the overall survey design. Survey design and sample design are thus two
interrelated concepts and one is complementary to other.
9) Sample size determination: Determination of sample size is perhaps the most difficult
part of a statistical investigation. Often it is claimed that a sample should bear some
proportional relationship to the size of the population from which is to be drawn. This is
not true. The size of a sample is a function of the variation in the population parameters
under study and the precision of the estimate needed by the researcher.
A sample of 500 may be appropriate sometimes, while a sample of more than
2000 is required in other circumstances. In another case, perhaps a sample of only 50 is
called for.
11) Selection and training of field workers: The validity of the survey results largely
depends on the personnel involved and their efficiency. It is therefore important to select
and train the field workers carefully. Training is especially important if interview method
is followed because the interviewers’ personal styles and presentations largely affect the
rate of response and the accuracy of responses.
11
01 Introduction to sampling techniques
12) Pre-testing: Pre-testing is a trial or operation that allows us to test the questionnaire or
other measurement instruments in the field, to screen interviewers and to check on the
management of field operations. The results of the pre-test usually suggest that some
modification must be made before a full-scale sampling is undertaken. It provides the
means of uncovering deficiencies and the basis for corrective action prior to carrying out
the actual survey. It may also suggest amount of workload to be assigned to each
investigator and an insight into the data processing operation in advance.
14) Data management: Large surveys generate huge amounts of data. Hence, a well
prepared data management plan is of prime importance. This plan should include the
steps for processing data from the very inception of the study until the final analysis is
completed. The administrative and computer procedures to be used, the type of staff
available and whether any training will be needed to facilitate data management should
also be described. A quality control scheme should also be included in the plan in order
to check for agreement between processed data and data gathered in the field.
15) Editing and checking: A detailed plan must be outlined at the outset to check and edit
the field data soon after they are at hand for any erroneous and inconsistent entries. Both
manual and computer checking may be employed for any inconsistencies in data. For
any erroneous entry, which cannot be corrected at this stage, should be corrected by re-
interviewing the respondents.
16) Data processing and analysis: Once the data are checked, edited and corrected for
errors, processing of data should be attempted keeping in view the objectives of the
study. This task also needs careful planning.
The next step is the statistical analysis, which is carried out to arrive at the desired
estimates of the population parameters. Statistical methods, which will be used for the
analysis of the data, should be outlined, including a description of how the information
collected will be used to test the stated hypothesis and how any missing data will be
dealt with.
12
01 Introduction to sampling techniques
18) Report writing: Finally, the findings of the study highlighting the policy implications
and suggesting possible actions and measures to be taken including policy
recommendation, should be written in a report.
19) Lessons learned: Survey is a complex undertaking and is liable to large margin of errors
if not properly handled. Because of this complexity, things never go exactly as we plan.
The main obstacles and difficulties, which interfere with the successful completion of the
study within the time and cost proposed, should therefore be described.
2) Less time: Being small in scale, a sample survey is not only less expensive than a census
but also needs less lime for obtaining the desired information.
3) Greater scope: The smaller scale is likely to permit the collection of a wider range of
survey data and allow a wider choice of methods of observations, measurements or
questioning than is usually feasible with a complete enumeration.
4) Respondent’s convenience: The sample survey considerably reduces the overall burden
of the respondents in the way that only a few, not all of the individuals in the population
are put to the trouble of having to answer questions or provide information.
5) Less labor: Sampling saves labor. A small staff is required both for fieldwork and for
tabulation and processing data.
6) Flexibility: In certain types of investigation, highly skilled and trained personnel or even
specializes equipment are needed to collect data. A complete enumeration in such cases is
impracticable and hence sample surveys, being more flexible will be more appropriate for
this type of inquiries.
7) Data processing: The data processing requirements for a sample survey are likely to be
much less than for a complete count. Whereas a complete count may well require a
computer to process the data, a sample survey can often be processed manually with
fewer people and less logistic supports.
8) Greater accuracy: A sample survey employs personnel of higher quality equipped with
intensive training and more careful supervision is possible for fieldwork. As a result,
observations, measurements or questioning for a sample survey can often be carried out
more carefully and thus yields results subject to smaller non-sampling error than in a
more complex extensive complete enumeration.
13
01 Introduction to sampling techniques
9) Feasibility: There are situations where complete enumeration is not feasible and thus a
survey is a must. There are also instances where it is not practicable to enumerate all the
units due to their perishable or fragile nature. The alternative in this situation is to take
only a few of the units.
For example, consider the problem of checking the quality of mango juice
produced by a company. One way to test the quality is to drink entire lot, which is
impracticable. Testing of electric bulb, screws, glasses, medicine all are examples of this
type, where sampling is a must.
❖ Limitations of sampling:
Despite several advantages of sample survey over complete count, it has some disadvantages or
limitations too. They are as follows:
1) The results of a sample survey are subject to sampling error and on that account are less
precise than those of a complete enumeration.
2) A sample may seriously over represent, under represent or even fail to represent the
population. In such instances, the estimates provided by such surveys are liable to larger
margin of errors.
3) Sampling theory requires the services of trained and qualified personnel and sophisticated
equipment for its planning, execution and analysis. In the absence of these, the results of
the sample survey are not trustworthy.
4) However, if the information is required about each and every unit of the universe, there is
no way but to resort to complete enumeration. More over if time and money are not
important factors or if the universe is not too large, a complete enumeration may be better
than any sampling method.
1) Accuracy: The accuracy of a sample estimate refers to its closeness to the true
population value. The closer the sample estimate to the population value, the greater is its
accuracy. The accuracy of an estimate is generally assessed on the basis of its mean
square error. The smaller the mean square error of an estimator, the greater is its
accuracy.
So, a good sample design must allow us to measure valid estimates of its
sampling variability, which is ordinarily expressed in terms of mean square error. This is
possible only when the sample is probability sample.
2) Reliability: If we assume that there is no measurement error in the survey, then the
reliability or precision of an estimator can be stated in terms of its sampling variance or
equivalently, of its standard error.
The standard error measures the precision with which the estimate from a
particular sample approximates the hypothetical average result from all possible samples.
The smaller the standard error of an estimate, the greater is its reliability.
14
01 Introduction to sampling techniques
So, a good sample design must allow us to measure valid estimates of its
sampling variability, which is ordinarily expressed in terms of standard error.
3) Validity: If we assume that there is no measurement error in the survey, then the validity
of an estimator can be evaluated by examining the bias of the estimator. The smaller the
bias, the greater is the validity.
Bias refers how far the average estimator lies from the parameter. Thus, if t is an
estimator and is its corresponding population parameter, then the bias of the estimator
is expressed as: B ( t ) = E ( t ) − .
The use of faulty data collection tools could lead to biased results. Bias can also
be introduced as a consequence of improper sampling design. This may result in the
sample not being representative of the study population.
So, a good sample design must be oriented to the research objectives in terms of
its selection and estimation of the population values. Furthermore, it must have the
compliance with the survey design and suit to the survey environment.
4) Efficiency: The criteria of efficiency are related to the cost of sampling. A sampling
design is considered to be more efficient than another, if the former results in lower costs
than the later design, with the same degree of reliability. So, economy is another aspect of
a sample design. Therefore, a good sample design must therefore involve lowest cost for
the fulfillment of the survey objectives.
15
01 Introduction to sampling techniques
16