Unit 12
Unit 12
Structure
12.0 Objectives
12.1 Introduction
12.2 Sampling Process
12.3 Types of Sampling
12.3.1 Probability Sampling
12.3.2 Non-Probability Sampling
12.3.3 Mixed Sampling
12.4 Selection of a Simple Random Sample
12.4.1 Lottery Method
12.4.2 Random Numbers Table Method
12.4.3 Steps in the Use of RNT
12.4.4 Advantages of SRS
12.4.5 Limitations of SRS
12.5 Selection of Systematic Random Sample
12.5.1 Advantages of Systematic Random Sampling
12.5.2 Disadvantages of Systematic Random Sampling
12.6 Selection of Stratified Random Sample
12.6.1 Proportional Stratified Sample
12.6.2 Disproportional Stratified Sampling
12.6.3 Advantages of Stratified Sampling
12.6.4 Disadvantages of Stratified Sampling
12.7 Selection of a Cluster Sample
12.7.1 Steps in Cluster Sampling
12.7.2 Advantages of Cluster Sampling
12.7.3 Disadvantages of Cluster Sampling
*
Prof. C G Naidu (retd.), School of Vocational Studies, IGNOU
12.0 OBJECTIVES Sampling
Procedure
On completion of this Unit, you should be able to
explain different methods of drawing a sample;
describe different types of samples;
use random number tables to draw a sample; and
determine the sample size.
12.1 INTRODUCTION
In Unit 2 of this course, you have learned about different types of data, namely,
primary data and secondary data. In that Unit, we also discussed the use of
different survey techniques like face-to-face interview, telephone survey, postal
survey, internet survey, etc. in collecting primary data. In Unit 15, you also have
learned the meaning of sampling, advantages of sampling, and sampling error.
In statistics, we often rely on a sample (that is, a small subset of a larger set of
data) to draw inferences about the population (that is, the larger set of data). For
example, you are interested to know the voting behaviour of Delhi people in the
next election. Who will you ask? Naturally, it is not possible for you to ask every
single Delhi voter how he or she is likely to vote. Instead, you may querry a
relatively small number of Delhi voters and draw inferences about entire Delhi
from their responses. In this case total voters of Delhi constitute the population
and the voters actually querried constitute your sample.
Ideally, the characteristics of a sample should reflect the characteristics of the
population from which it is drawn. In such cases, the inferences drawn from a
sample are probably applicable to the entire population.
In this unit you will learn how to draw a sample under different population
characteristics and how to determine the sample size.
265
Sampling and of the survey, because all other steps – target population, sampling frame,
Statistical Inference
sampling procedure, etc. – are designed according to survey objectives.
2) Questionnaire Design: Keeping the objectives of the survey in view we are
required to design a questionnaire. We have already learnt the major steps
involved in designing of a questionnaire in Unit 1 of this course. In addition
to the questionnaire we need to develop training documents for the
investigators, particularly when the sample survey is conducted at a larger
scale involving a number of investigators.
4) Identifying Sampling Frame: The sampling frame is a list of cases from the
target population. The sampling frame is the actual operational definition of
the target population. In our earlier example of eligible couples in Delhi
using family planning methods, all the people in the reproductive age group
form the sampling frame. Many times we may not be able to list all the
cases in the target population for some reason or other. For example, we
want to list the people for a survey based on telephone directory. In this
case, certainly those people who do not have telephone numbers in the
directory will be excluded from the listing. This type of error is called
sampling frame error.
5) Selecting Sampling Procedure: Once the sampling frame is identified, we
select appropriate sampling procedure to select the sample for the survey.
We will discuss various sampling procedures in detail in the next section of
this Unit.
6) Selecting the Sampling Units: Sampling units are those cases from the
sampling frame which are included in the sample by using appropriate
sampling procedure. Essentially, a sampling unit is the case on which data
is collected. For example, you may decide to take 1000 sampling units from
the sampling frame (consisting of all the reproductive age group people in
Delhi) for your sample survey.
7) Survey data Processing: After selection of sampling units the next step is
data collection and processing. We need to check the incomplete
questionnaires and edit or cross-check the responses wherever there is a
doubt. Data entry and tabulation follows.
266
8) Analysis of Data: The next step in the sequence is analysis of data. Keeping Sampling
Procedure
in view our requirements we analyse the data by using various statistical
tools.
267
Sampling and selected in the sample. In fact, if N is the size of the population, this probability is
Statistical Inference
1
. On the other hand, in the case of simple random sampling without
N
replacement, the unit once selected is not returned to the population in the sense
that it becomes ineligible for selection again. As a result, after each draw, the
composition of the population changes. Therefore, the probability of any
particular unit being selected also changes.
b) Systematic Random Sampling
In this variant of random sampling, only the first unit of the sample is selected at
random from the population. The subsequent units are then selected by following
some definite rule. For example, suppose, we have to choose a sample of
agricultural plots. In systematic random sampling, we begin with selecting one
plot at random and then every j th plot may be selected.
c) Stratified Random Sampling
Stratified random sampling is the appropriate method if the population under
consideration consists of heterogeneous units. Here, first we divide the
population into certain homogeneous groups or strata from each stratum.
Secondly, some units are selected by simple random sampling. Thirdly, after
selecting the units from each stratum, they are mixed together to obtain the final
sample.
Let us consider an example. Suppose, we want to estimate the per capita income
of Delhi by a sample survey. It is common knowledge that Delhi is characterised
by rich localities, middle class localities and poor localities in terms of the
income groups of the people living in these localities. Now, each of these
different localities can constitute a stratum from which some people may be
selected by adopting simple random sampling procedure.
d) Multi-Stage Random Sampling
Let us consider a situation where we want to obtain information from a sample of
households in a large city, say, Delhi. Sometimes, it may not be possible to
directly take a sample of households because a list of all the households may not
be easily obtained. In such a situation, one may resort to take samples in various
stages. Generally, the city is divided into certain geographical areas for
administrative purposes. These areas may be termed as city blocks. So in the first
stage, some of such blocks may be selected by random sampling. In the next
stage, from each of the selected blocks in the first stage, some households may be
selected again by the principle of random sampling. In this way, ultimately a
sample of households from a large city may be obtained. The above-mentioned
example is the case of a two-stage random sampling. However, if the nature of
the inquiry so demands, the method of sampling can be extended to more than
two stages.
268
12.3.2 Non-Probability Sampling Sampling
Procedure
We have considered the method of random sampling and some of its variants
above. It should be clear that the basic objective of the principle of random
sampling is to eliminate or at least minimise the effect of the subjective bias of
the investigator in the selection of the population sample. But for certain
purposes, there is a need for using discretion. For example, suppose a teacher has
to choose 4 participants from a class of 30 students in a debate competition. Here,
the teacher may select the top 4 debaters on the basis of her own conscious
judgement about the top debaters in the class. This is an example of purposive
sampling. In this method, the purpose of the sample guides the choice of certain
members or units of the population.
12.3.3 Mixed Sampling
In mixed sampling, we have some features of both non-probability sampling and
random sampling. Suppose, an institute has to send 5 students for managerial
training in a company during the summer vacation. Initially, it may shortlist
about 20 students who are considered to be suitable for the training by applying
its own discretion. Then from these 20 students, 5 students may finally be
selected by random sampling.
We will discuss the process of drawing various types of samples later in this
Unit.
The probability that each digit (0,1,2,3,4,5,6,7,8 or 9) will appear at any place is
the same, that is 1/10.
The occurrence of any two digits in any two places is independent of each other.
39634 62349 74088 65564 16379 19713 39153 69459 17986 24537
14595 35050 40469 27478 44526 67331 93365 54526 22356 93208
30734 71571 83722 79712 25775 65178 07763 82928 31131 30196
270
64628 89126 91254 24090 25752 03091 39411 73146 06089 15630 Sampling
Procedure
42831 95113 43511 42082 15140 34733 68076 18292 69486 80468
80583 70361 41047 26792 78466 03395 17635 09697 82447 31405
00209 90404 99457 72570 42194 49043 24330 14939 09865 45906
05409 20830 01911 60767 55248 79253 12317 84120 77772 50103
95836 22530 91785 80210 34361 52228 33869 94332 83868 61672
65358 70469 87149 89509 72176 18103 55169 79954 72002 20582
72249 04037 36192 40221 14918 53437 60571 40995 55006 10694
41692 40581 93050 48734 34652 41577 04631 49184 39295 81776
61885 50796 96822 82002 07973 52925 75467 86013 98072 91942
48917 48129 48624 48248 91465 54898 61220 18721 67387 66575
88378 84299 12193 03785 49314 39761 99132 28775 45276 91816
77800 25734 09801 92087 02955 12872 89848 48579 06028 13827
24028 03405 01178 06316 81916 40170 53665 87202 88638 47121
86558 84750 43994 01760 96205 27937 45416 71964 52261 30781
78545 49201 05329 14182 10971 90472 44682 39304 19819 55799
14969 64623 82780 35686 30941 14622 04126 25498 95452 63937
58697 31973 06303 94202 62287 56164 79157 98375 24558 99241
38449 46438 91579 01907 72146 05764 22400 94490 49833 09258
62134 87244 73348 80114 78490 64735 31010 66975 28652 36166
72749 13347 65030 26128 49067 27904 49953 74674 94617 13317
81638 36566 42709 33717 59943 12027 46547 61303 46699 76243
46574 79670 10342 89543 75030 23428 29541 32501 89422 87474
11873 57196 32209 67663 07990 12288 59245 83638 23642 61715
13862 72778 09949 23096 01791 19472 14634 31690 36602 62943
08312 27886 82321 28666 72998 22514 51054 22940 31842 54245
11071 44430 94664 91294 35163 05494 32882 23904 41340 61185
271
Sampling and
82509 11842 86963 50307 07510 32545 90717 46856 86079 13769
Statistical Inference
07426 67341 80314 58910 93948 85738 69444 09370 58194 28207
57696 25592 91221 95386 15857 84645 89659 80535 93233 82798
08074 89810 48521 90740 02687 83117 74920 25954 99629 78978
20128 53721 01518 40699 20849 04710 38989 91322 56057 58573
00190 27157 83208 79446 92987 61357 38752 55424 94518 45205
23798 55425 32454 34611 39605 39981 74691 40836 30812 38563
85306 57995 68222 39055 43890 36956 84861 63624 04961 55439
99719 36036 74274 53901 34643 06157 89500 57514 93977 42403
95970 81452 48873 00784 58347 40269 11880 43395 28249 38743
56651 91460 92462 98566 72062 18556 55052 47614 80044 60015
71499 80220 35750 67337 47556 55272 55249 79100 34014 17037
66660 78443 47545 70736 65419 77489 70831 73237 14970 23129
35483 84563 79956 88618 54619 24853 59783 47537 88822 47227
09262 25041 57862 19203 86103 02800 23198 70639 43757 52064
We need to follow the following steps while selecting a SRS by using RNT.
272
5. Choose the direction in which you want to read the numbers (say Sampling
Procedure
from left to right or right to left or top to bottom or bottom to top).
6. Suppose you are looking for two digit numbers (00 to 99) you may
not get these numbers by direct reading from the tables since they are
5 digit numbers (see Table 12.1). You can either look at the last two
digits or first two digits of the numbers. For example, if the 5 digit
number you have chosen is 54245 (that is, the number in the 29th row
and 10th column of the random number table given at Table 12.1).
Then, the two digit number will be 45 if you chose the last two digits
of the number.
7. Look only at the numbers assigned to each population unit. If the
number represents one of the unit of the population it becomes part of
the sample. Suppose you want to select 10 sample units, the other
numbers you will be choosing are 71(11071), 30(44430),
64(94664),94(91294),63(35163),82(32883),04(23904), 40(41340),
85(61185). Observe that you have omitted 94(05494) since you have
already chosen this number.
8. Once a number is chosen, do not select it again.
9. If you reach the end point of the table before obtaining the required
sample, pick another starting point in the random number table and
select the remaining units for the sample.
Example 12.2
Suppose you have to select 100 account holders as a sample out of total 1000
account holders in the population using random numbers table.
628 126 254 090 752 091 411 146 089 630
831 113 511 082 140 733 076 292 486 468
583 361 047 792 466 395 635 697 447 405
209 404 457 570 194 043 330 939 865 906
409 830 911 767 248 253 317 120 772 103
836 530 785 210 228 869 332 868 672 358
469 149 509 176 169 954 002 582 249 037
192 221 918 437 571 995 006 694 692 581
050 734 652 577 631 184 295 776 885 796
822 973 822 467 013 072 942 917 129 624
Here, first you assign each account holder a number from 000 to 999. To draw a
sample of 100 account holders, you need to find 100 three digit numbers in the
range 000 to 999. Pick up any row or column in the random numbers table given
in Table 12.1. Suppose you have selected the fourth row and first column as
starting point to draw the sample, the first digit number is 628(64628) if you
chose last 3 digits as the number for your purpose. Here, you read the last 3 digits
273
Sampling and of the number. If the number is within the range (000 to 999) include the number
Statistical Inference
in the sample. Otherwise skip the number and read the next number in some
identified direction. If a number is already selected omit it. In this example since
you have started with fourth row and first column and moving from left to right
direction the following 100 numbers are selected for the sample.
If the number of units in the population is very large, neither of the above two
methods is feasible. These days by using a computer we can select a random
sample in a much easier way. There are many computer program which can
generate a series of random numbers if we have the units of the population listed
in a computer.
We will explain one way of selecting a sample using computer generated random
numbers. In our example, let us assume we can copy and paste the list of account
holders into a column in an EXCEL spreadsheet. Then, in the column right to it
we paste the function =RAND() which is EXCEL's way of putting a random
number between 0 and 1 in the cells. Then, all we have to do is take the first 100
names in the sorted list. The entire process takes a minute if we are familiar with
using EXCEL program in computer.
To use systematic random sampling, the first thing we need to do is listing of the
population units in a random order by giving numbers from 1 to 500. The
sampling interval is K=500/60 = 8.3 or say 8. Now we select the first sample unit
at random from the first 8 population units. Suppose the first unit selected is 5.
The subsequent sample units selected are: 13, 21, 29, 37.........477. Therefore,
following are the population units selected in the sample.
5 13 21 29 37 45 53 61 69 77
85 93 101 109 117 125 133 141 149 157
165 173 181 189 197 205 213 221 229 237
245 253 261 269 277 285 293 301 309 317
325 333 341 349 357 365 373 381 389 397
405 413 429 429 437 445 453 461 469 477
275
Sampling and
Statistical Inference Thus, in the systematic random sampling procedure, the first sample unit is
selected at random and this sample unit in turn determines the subsequent sample
units to be selected. However, it is essential that the units in the population are
randomly ordered. In certain cases we prefer using systematic random sampling
procedure to simple random sampling procedure because it is easier to select
sample units. For example, if you want to find out the yield of coconut trees in a
field, select a tree at random, other trees are automatically selected at a gap
equivalent to sampling interval.
276
12.6 SELECTION OF STRATIFIED RANDOM Sampling
Procedure
SAMPLE
In some cases the population may not be homogenous, that is, all the units may
not be equal with respect to the characteristic we intent to survey. The
characteristics of the population under study may be male/female, rural/urban,
literate/illiterate, high income/low income groups, etc. In situations where these
units vary widely, the simple random sampling procedure or the systematic
random sampling procedure will not provide us with a representative sample. In
such situations by using stratified random sampling we can obtain a
representative sample.
In stratified sampling, we divide the population into different strata in such a way
that units are homogenous within each stratum. Moreover, each stratum is
different. Suppose we want to stratify the population on the basis of gender
distribution then we list the population units separately according to males or
females. Subsequently, we decide the sample size to be drawn from each stratum.
There are two approaches to decide the sample size from each stratum. These are:
(a) proportional stratified sample, and (b) disproportional stratified sample. We
will discuss these two procedures below.
12.6.1 Proportional Stratified Sample
When we take a sample from a population with several strata, we are required to
take samples from each stratum. Such sample could be in proportion of the
stratum population size to the total population size. Suppose we divide the
population (N) into K non-overlapping strata N1, N2, N3..... NK such that
N1+N2+N3+...........+NK = N. We decide to draw a sample of the size n. Then the
sample proportions of different strata are given by:
n1 n2 n3 n
... 12
N1 N 2 N3 NK
Example 12.4
Suppose we want to draw a sample of 200 units from a population consisting of
1000 units. The population is heterogeneous in nature in terms of high income or
low income and rural or urban. The strata population sizes are given as follows:
277
Sampling and population. In our example, to have a sample of 200 units, the proportion of
Statistical Inference
Follow the steps given below for using disproportional stratified sampling.
278
1) Divide the population into strata based on the chosen characteristic Sampling
Procedure
(example, Rural/Urban, Male/Female, etc.)
2) The number of units taken from each stratum is directly proportional to the
relative size of the stratum and standard deviation i of the characteristic
under consideration. Suppose, if 1, 2, 3, …k are the standard
deviations of k strata and P1, P2, P3, …Pk are the stratum proportions to
the total population, and n ( n1 n2 ...... nk ) is the sample size
required. Then the stratum sample size using disproportional stratified
sampling procedure is
Pi i n
ni
Pi i
3) Choose the sample from each stratum using either simple random
sampling or systematic random sampling.
Let us go back to Example 12.4, where we have divided the population into 4
strata. We observe that there are small number of households in high income
strata and large number of households in low income strata. Assume that
the variance of income among higher income groups is higher than the
variance among the lower income groups. Therefore, in order to avoid under-
representation of higher income groups in the sample, a disproportional
sample is taken in each stratum. That means, if the variability within the stratum
is higher, we must have larger sample size of that stratum to increase the
precision of the estimates. Similarly, if the variability within the stratum is lower,
we must have smaller sample size of that stratum. That is, higher the stratum
variance larger the stratum sample size and lower the stratum variance smaller
the sample size. This is in addition to the fact that larger stratum size requires a
larger sample size.
Example 12.5
Consider Example 12.4 again. Suppose the stratum variances are given as
follows:
In the first stage you select the clusters and in the second stage you select the
sample units within each sampled clusters. This sampling procedure is called two-
stage sampling. Here, the clusters are called primary units and the units within the
sampled clusters are called secondary units.
Example 12. 6
Suppose we are interested in finding the options of ATM customers of a Bank in
Uttar Pradesh state. We can divide the state into say 30 clusters (may be we can
281
Sampling and consider district as a unit and include one or two districts in one cluster). Here, we
Statistical Inference
assume that each of these clusters will represent the opinions of the ATM
customers of Uttar Pradesh as a whole. We then select a sample of clusters and
obtain the opinion of all the ATM customers in each of the cluster.
a) The main advantage of cluster sampling is that it takes less travel time and
related data collection costs.
b) Since the researcher need not cover all the clusters and only sample of clusters
are covered, it is a more practical method which facilitates fieldwork.
b) The cluster sampling has a lower sampling efficiency for a given sample
size than random sampling and stratified sampling. This method is cost
ffective not statistically efficient.
The four methods we have covered so far, namely, (a) simple random sampling,
(b) systematic random sampling, (c) stratified sampling, and (d) cluster sampling
are the simplest probability (or random) sampling procedures. However, in real-
life, we use sampling methods that are more complex than the above four
methods. The basic principle in multistage sampling is that we can combine these
simple methods in a variety of useful ways to address our sampling needs. We call
it multistage sampling when we combine two or more of the above sampling
methods.
Example 12.7
Consider the case of interviewing school students in Haryana in order to grade the
schools according to socio-economic background of the parents. For this problem,
in the first stage we need to apply cluster sampling. We divide the state of
Haryana into a number of clusters, say districts. Then we select a sample of
districts (clusters) using simple random sampling method. In the second stage we
282
divide the schools using stratified sampling. Here the strata may be government Sampling
Procedure
schools, government-aided schools, central schools, and public schools. We select
a sample of schools in each stratum using either a simple random sampling or a
systematic random sampling. In the third stage we again use simple random
sampling and select a sample of classes in each sampled school for face-to-face
interviews with the students. In the fourth stage of sampling we consider selecting
a sample of students from each sampled class using simple random sampling or
systematic random sampling.
284
selection bias, and (b) it does not provide a representative sample of the Sampling
Procedure
population and therefore we cannot generalise the results.
287
Sampling and depending upon various considerations discussed above. In this section we will
Statistical Inference
discuss three of them.
If we wish to report the results as percentages (proportions) of the sample
responding, we use the following formula:
Pi (1 Pi )
ni
A 2
P (1 Pi )
2
i
Z Ni
Example 12.8: A population consists 80% rural and 20% urban people. Given
that the population size is 50000, determine the sample size required. Assume
that the desired precision and confidence levels are 1% and 99% respectively.
In this example,
P1 = proportion of rural people = 0.80
P2 = proportion of urban people = 0.20
N1 = rural population size = 50000 0.80 = 40000
N2 = urban population size = 50000 0.20 = 10000
A = 0.01
Z = 2.58 (at 99% confidence level)
The required sample size is
P1 (1 P1 )
n1 = rural sample =
A 2
P (1 P1 )
2
1
Z N1
0.80(1 0.80)
=
0.012 0.80(1 0.80)
2
2.58 40000
0.80(0.20)
=
0.0001 0.80(0.20)
6.6564 40000
0.16
=
0.16
0.000019
40000
288
0.16 Sampling
=
0.000019 0.000004 Procedure
0.16
= = 8410.8 or say 8411
0.000023
P2 (1 P2 )
n2 = urban sample =
A 2
P (1 P2 )
2
2
Z N2
0.20(1 0.20)
=
0.012 0.20(1 0.20)
2
2.58 10000
0.20(0.80)
=
0.0001 0.20(0.80)
6.6564 10000
0.16
=
0.16
0.000019
10000
0.16
=
0.000019 0.000016
0.16
= = 4568.4 or say 4568
0.000035
2.5 2
=
0.05 2 2.5 2
1.96 2 10000
6.25
=
0.0025 6.25
3.8416 10000
6.25
=
0.000651 0.000625
6 .25
= = 4898
0.001276
Example 12.10: Given that the population size is 10000, determine the sample
size required when desired precision and confidence levels are 5% and 99%
respectively.
In this example,
N = 10000
A = 0.05
Z = 2.58 (at 99% confidence level)
The required sample size is
290
0.25 Sampling
n
0.05 2
0.25 Procedure
2.582 10000
0.25
n
0.0025 0.25
6.6564 10000
0.25 0.25
n 624
0.0003756 0.000025 0.000401
291
Sampling and Using a sample saves a lot of money, time and manpower. If a suitable sampling
Statistical Inference
procedure is used in selecting units, appropriate sample size is selected and
necessary precautions are taken to reduce sampling errors, then a sample should
yield a valid and reliable information about the population.
292