Unit-3 by EasePDF
Unit-3 by EasePDF
Sampling
SAMPLING
Structure
3.1Introduction
Objectives
3.2Principles of Stratification
Notations and Terminology
3.3Properties of Stratified Random Sampling
3.4Mean and Variance for Proportions
3.5Allocation of Sample Size
Equal Number of Units from Each Stratum
Proportional Allocation
Neyman’s Allocation
Optimum Allocation
3.6 Stratified Sampling versus Simple Random Sampling
Proportional Allocation Versus Simple Random Sampling
Neyman’s Allocation Versus Proportional Allocation
Neyman’s Allocation Versus Simple Random Sampling
Merits and Demerits of Stratified Random Sampling
3.7 Summary
3.8 Solutions/Answers
3.1 INTRODUCTION
When the units of the population are scattered and not completely
homogeneous in nature, then simple random sample does not give proper
representation of the population. So if the population is heterogeneous the
simple random sampling is not found suitable. In simple random sampling the
variance of the sample mean is proportional to the variability of the sampling
units in the population. So, in spite of increasing the sample size n or
sampling fraction n/N, the only other way of increasing the precision is to
device a sampling which will effectively reduce the variability of the sample
units, the population heterogeneity. One such method is stratified sampling
method.
Thus, all strata would comprise the population. Then from each stratum
sample would be drawn and lastly all samples would be combined to get the
ultimate sample. For example, let us consider that population consists of N
units and these are distributed in a heterogeneous structure. Now first of all
45
Statistical Techniques
we divide the population into ‘k’ non overlapping strata of sizes N1, N2, N3,
..., Nk such that each stratum becomes homogeneous. Evidently N = N1 + N2 +
N3 + ... + Nk. Then from first stratum a sample of size n1 would be drawn by
simple random sampling method. Similarly, from the second stratum a sample
of n2 units would be drawn and so on, up to kth stratum. Now all these k
samples would be combined to get the ultimate sample. So, the ultimate size
of sample would be n n1 n2 n3 ... nk . This method of sampling is
known as Stratified random sampling because here stratification is done first
to make population homogeneous and then samples are drawn randomly by
simple random sampling from each stratum.
The principles of stratification are explained in Section 3.2. The properties of
stratified random sampling are described in Section 3.3, whereas Section 3.4
provides the derivation of the mean and variance of proportions in stratified
random sampling. The allocation of sample size with the help of different
techniques is described in Section 3.5. The comparative study between
stratified random sampling and simple random sampling is given in Section
3.6.
Objectives
After studying this unit, you would be able to
define the stratified random sampling;
explain the principles of stratification;
describe the properties of stratified random sampling;
derive the mean and variance of proportions in stratified random
sampling;
describe the allocation of sample size with the help of different
techniques; and
calculate the estimate of population mean and variance of sample mean.
1. The strata should not be overlapping and should together comprise the
whole population.
46
3.2.1 Notations and Terminology Stratified Random
Sampling
N = Population size
n = Sample size
k = Number of strata
Ni = Size of ith stratum
k
Then N N i
i1
Then n n i
i1
Xij
= Value of character under study for jth unit of ith stratum
Ni
Xi Population mean
i
th stratum
1 X ij
of j1
Ni
k Ni
1
X Mean
N
X ij
Population
i1 j1
k k
1
N i1
N i X i Wi X i i1
is called the weight of i
where, Wi N
stratum
N
Si
th i
2
= Population mean square
Ni
of the i th stratum
1
2
Ni 1 Xij Xi , j 1, 2,..., & i = 1, 2, ..., k
j1
Ni
x ij
= Value of jthsample unit taken from ith stratum
n
x i = 1 xij = Mean of sample units selected from i stratum
i
ni j1 2
ni
1
x xi ,
2
si ij
i 1, 2,..., k
ni 1 j1
where, xst
is the weighted mean of the strata sample means, weights being
equal to strata sizes. These two will be identical if ni Ni
47
Statistical Techniques
3.3 PROPERTIES OF STRATIFIED RANDOM
SAMPLING
E x st
k
1
E i
1
Ni x i
1 k
N Ex
Ni1 i i
Since the sample units selected from each of stratum are simple random
sample, then we have
E x i Xi
Therefore,
1 k
E N X
xst N i1 i i
k Ni
1 1
=
N
N i1
i X
i j1
ij
k Ni
N
1
=
N
Xi1 j1
ij
X
Hence proved
The covariance term vanish since the samples from different strata are
independent and the sample units in each stratum are the simple random
sample without replacement, we have
48
1 1 2
Var x S
Stratified Random
Sampling
n N
1 1 2
or Ni
Var (xi ) = i
in
Therefore, S k
2 1 1 2
Si N i
Var x st
ni
W i
i1
1
2 Ni
N i1 k 2
Ni S
ni n i
i
From the above result the variance depends on Si2 the heterogeneity within the
strata. Thus, if Si2 are small i.e. strata are homogeneous then stratified
sampling schemes provides estimates with greater precision.
Theorem 3: If Si2 is not known then prove that estimate of the variance of the
sample mean of the stratified random sample is given by
k
1
1 2 2
EVarx Wi S i i
st Ni
n
i1
Proof: In general Si2are not known. A simple random sample is drawn from each stratum. If we assume a
individual stratum as a population then the sample, drawn from it, would be a simple random sample. If the sample
is drawn from ith stratum, the sample mean square s 2would be an estimate of population mean square Si
i. e.
E s2
i 1, 2,..., k
… (1)
i
2
S
i i
Varx Wi si i
Ni
Therefore, st i1
n
1 1 2 2
EVar x st E Wi i si
N
n i
i1
k
1 1 2
Wi
E is
2
n N
i1 i i
Substituting from equation (1), we get
k
1 1 2 2
EVar x st Wi Si i
n Ni
i1
49
Statistical Techniques
3.4 MEAN AND VARIANCE FOR PROPORTIONS
As in simple random sampling, we can divide a population into two classes
with respect to a attribute. Hence the units in the population are classified in
these two classes accordingly as it possesses or does not possess the given
attribute. After taking a sample of size n, we may be interested in estimating
the population proportion of the defined attribute.
If a unit possesses the attribute, it receives the code value 1 and if an unit
does not possesses the attribute, it receives the value 0. Let the number of
units belonging to A in the i2
th
stratum of size Ni be Mi and if the sample of size ni taken from ith stratum, the number of units belonging to A be
mi. Denoting the proportion of units belonging to A in the population, in the ithstratum and sample from the ith
stratum by , i and pi respectively, various formula for mean and variance are as follows:
M m
π i and p i
i i
Ni ni
and k Ni
k
π πi W i πi
N for i =1, 2,…, k
i1
i1
The estimated proportion pst under stratified sampling for the units belonging
to A is
k
pst
Wi
pi i1
Mean p st
Wi pi Wi E p i
k k
E
i1 i1
since we draw SRS from each stratum so by Theorem 10 of Section 2.4 of
Unit 2 we have
E p i π i
50
Ni n i π i 1 π i Stratified Random
Var pi . Sampling
N1 i
n
i
N i 1 n i
i i i
N i1
2
where, qi = 1− pi
51
Statistical Techniques n
n for all i =1, 2, …, k
i
k
Var
x st
1
PROP
N i1 … (3)
n
1 1
W Sk
2
N
n i i
i1
Var x st PROP
n
i1
... (4)
3.5.3 Neyman’s
Allocation
This allocation of the total sample size n to the different stratum is
called minimum variance allocation and is due to Neyman (1934).
This result was first discovered by Tchuprow (1923) but remained
unknown until it was rediscovered independently by Neyman. This
allocation of samples among different strata is based on a joint
consideration of the stratum size and the stratum variance. In this
allocation, it is assumed that the sampling cost per unit among
different strata is same and the size of the sample is fixed. Sample
sizes are allocated by
52
nin Wi S Ni Si
i n Stratified Random
k Sampling
k
WS
i1
i i
N i
Si … (5)
i1
Wi Si W i i
2
S
i1
i1
NEY
n N … (6)
Var
xst
In any stratum the cost of survey per sampling unit cannot be the same. That
is, in one stratum the cost of transportation may be different from the other.
Hence, it would not be wrong to allocate the cost of the survey in each stratum
differently.
Let ci be the cost per unit of survey in the ithstratum from which a sample of
size ni is stipulated. Also suppose c0 as the over head fixed cost of the survey.
In this way the total cost C of the survey comes out to be
k
C c 0 ci
ni … (7)
i1
c0 and ci are beyond our control. Hence we will determine the optimum
value of ni which minimizes the variance of stratified sample mean.
To determine the optimum value of ni, we consider the function
Var x st C
k
1 1 2 2 k
Wi Si λ c 0 ci ni
i 1 ni Ni i 1 ... (8)
where is constant and known as Lagrange’s multiplier.
k
1 Wi Si
n k
i1
i
i1
ci
1 Wi Si
or n
k ci
i1
1 n i1
Wi Si
ci
Substituting the value of in equation (10), we get the value of ni
W S ci N S
k ci
i i
ni
n k i i
… (11)
n
i1
Wi Si ci NS i i ci
i1
Thus, the relation (11) leads to the following important conclusions that we
have to take a larger sample in a given stratum if
1. The stratum size Ni is larger;
2. The stratum has larger variability (Si); and
3. The cost per unit is lower in the stratum.
54
X X i 2
N 1
where, S2i 1 Ni
ij
Stratified Random
Sampling
i j1
and Var x 1 1
S2
SRSWOR
k Ni
n N … (13)
where, S 1 X ij X
2 2
N 1 i1 j1
In order to comparing (12) and (13) we shall first express Sk
2
2
in terms of S2i we have
2
N
S
N1 Xij Xi Xi X
k i
k Ni
i1 j
1 2
2
N
N 1 S2 Xij Xi
i1 j1
Xi X
i1 j1 k
Ni
2 Xi X Xij Xi
i1 j1
2
i i i i
i1 i1
Ni
X X 0 ij
i
j1
being the sum of square of deviation from the stratum mean. If we assume that
Ni and consequently N are sufficiently large so that we can put Ni-1= Ni and
N-1 = N, then we get
N X X
k k 2
NS2
2
i
Si i
i1
N
i1 k i
S2
k W S2 W X
2
… (14)
X
i i i i
i1 i1
Wi Si2
i i
SRSWOR n N i1 n N
i1
Var
k1
1 2
x SRSWOR
Var xst x n PROP
N
Wi Xi X
st
i1
Var x
SRSWOR
Var
PROP
… (15)
2
1
W S
W S
k
k
Var x
st NEY i i
n i i N … (17)
i1
i1
2
k 1 k
W S
W S 2
i i
n
i1
i i
2
Ni1
k k
1
W S2 WS
ii
n i1 i
i1
k
1
n W )(S S i
i … (18)
i1
where, S 1
W kS N S is the weighted mean of the stratum sizes N
k
i i i i i
N
i1 i1
From the relationship between the proportional allocation and simple random
sampling and the relation between proportional and Neyman allocation we
have
Var x x
Var 1 1 2
X
2
k
W X
st PROP i i
SRSWOR
n N i1
… (19)
x PRO x st NEY
P
and Var st
Var
1
n S ... (20)
k 2 i i
i1 W S
By substituting the value of the variance under proportional allocation in
equation (19) from equation (20), we have
Var x Varx 1
k W S
2
S
SRSWOR st NEY i i
n i1
1 1
X
2
…(21)
N
W X
i i
n i1
That means
Var x SRSWOR Var x st NEY
because both the terms in R.H.S. of equation (21) are positive. From the
results of the relations of variance of simple random sample mean and the
56
variance of stratified sample means with proportional and Neyman Stratified Random
Sampling
allocations, we can reach on the conclusion that
Var x SRSWOR Var x st PROP Var x st NEY
3.6.4 Merits and Demerits of Stratified Random Sampling
Merits
1. More Representative
Stratified random sampling ensures any desired representation in the
sample of the various strata in the population. It overruled the probability
of any essential group of the population being completely excluded in the
sample.
2. Greater Accuracy
Stratified random sampling provides estimate of parameters with
increased precision in comparison to simple random sampling. Stratified
random sampling also enables us to obtain the results of known precision
for each of the stratum.
3. Administrative Convenience
The stratified random samples would be more concentrated geographically
in comparison to simple random samples. Therefore, this method needs
less time and money involved in interviewing the supervision of the field
work can be done with greater case and convenience.
Demerits
However, stratified random sampling has some demerits too, which are:
2. Lower Efficiency
If the sizes of samples from different stratum are not properly determined
then stratified random sampling may yield a larger variance that means
lower efficiency.
Example 1: A sample of 60 persons is to be drawn from a population
consisting of 600 belonging to two villages A and B. The means and standard
deviations of their marks are given below:
57
Statistical Techniques
Solution: If we regard the villages A and B as representing two different
strata then the problem is to draw a stratified random sample of size 30 using
technique of proportional allocation. In proportional allocation, we have
n
n N
Therefore,
i i
N
n1 60
40
400
n2 600
20
60
200
600
Thus, the required sample sizes for the villages A and B are 40 and 20
respectively.
E2) Obtain the sample mean and estimate of the population mean for the
given information in Example 1 discussed above.
3.7 SUMMARY
In this unit, we have discussed:
1. The definition and procedure of stratified random sampling;
2. The principles of stratification;
3. The properties of stratified random sampling;
4. The mean and variance of proportions in stratified random sampling;
5. The allocation of sample size with the help of different techniques; and
6. Calculation of the estimate of population mean and variance of sample
mean.
E1) If we regard the collages A and B representing two different strata then
the problem is to draw as stratified sample of 100 employees using
technique of proportional allocation and Neyman’s allocation.
58
In proportional allocation we have Stratified Random
Sampling
n
n
100
N N
500
i i
N
k
i
60
100
n 40
300
1
500
100
n
200
2
500
In Neyman’s allocation, we have
ni n
NiSi
k
NS i i
i1
200 5
n 100 57.14 57
2
3500
Therefore, the samples regarding the colleges A and B for both
allocations are obtained as:
Proportional Neyman
Collage A 60 43
Collage B 40 57
Total 100 100
Ni σi N N S2 N σ2 2 2
Village Xi S2 i i i i Xi Ni Xi
σ N1
2 i i
59
Statistical Techniques 1
600 400 60 200 120
1 48000
24000 24000 80
600 600
1 1 N
X
i 2
Var x
i
S
i1
PROP
n N N
= 36.1675
Var x 1 1
SRSWOR
S2
n N
where, 1 k
i
2
S2
n Xij X
i1 j1
1 k 2
N 1
N 2 k N X X
i i i i
i1 i1
k
1
N 2
k
N X 2 NX 2
N 1 i1
i i
i
i1
1
599 1440000 4320000 600 80 80
1 1920000
599 5760000 3840000
599
Varx
600 60 1920000
SRSWOR
600 599
60
Then the conclusion is
Var x st PROP
36.1675 Var x SRSWOR
48.08
Therefore, precision of
can be obtained by x st
Var x SRS Var x st PROP
Gain in precision = 100
Varx st PROP
48.04 36.1675
36.1675 100
= 32.8 %
60