Allocation of Sample Size
Allocation of Sample Size
Stratified Sampling
An important objective in any estimation problem is to obtain an estimator of a population parameter
which can take care of the salient features of the population. If the population is homogeneous with
respect to the characteristic under study, then the method of simple random sampling will yield a
homogeneous sample, and in turn, the sample mean will serve as a good estimator of the population
mean. Thus, if the population is homogeneous with respect to the characteristic under study, then the
sample drawn through simple random sampling is expected to provide a representative sample.
Moreover, the variance of the sample mean not only depends on the sample size and sampling fraction
but also on the population variance. In order to increase the precision of an estimator, we need to use a
sampling scheme which can reduce the heterogeneity in the population. If the population is
heterogeneous with respect to the characteristic under study, then one such sampling procedure is a
stratified sampling.
Example: In order to find the average height of the students in a school of class 1 to class 12, the
height varies a lot as the students in class 1 are of age around 6 years, and students in class 10 are of age
around 16 years. So one can divide all the students into different subpopulations or strata such as
Students of class 1, 2 and 3: Stratum 1
Students of class 4, 5 and 6: Stratum 2
Students of class 7, 8 and 9: Stratum 3
Students of class 10, 11 and 12: Stratum 4
Now draw the samples by SRS from each of the strata 1, 2, 3 and 4. All the drawn samples combined
together will constitute the final stratified sample for further analysis.
Population (N units)
Strata are constructed such that they are non-overlapping and homogeneous with respect to the
k
characteristic under study such that N
i 1
i N.
Draw a sample of size ni from ith ( i 1, 2,..., k ) stratum using SRS (preferably WOR)
In cluster sampling, the clusters are constructed such that they are
within heterogeneous and
among homogeneous.
[Note: We discuss the cluster sampling later.]
Note that there are k independent samples drawn through SRS of sizes n1, n2 ,..., nk from each of the
strata. So, one can have k estimators of a parameter based on the sizes n1, n2 ,..., nk respectively. Our
interest is not to have k different estimators of the parameters, but the ultimate goal is to have a single
estimator. In this case, an important issue is how to combine the different sample information together
into one estimator, which is good enough to provide information about the parameter.
We now consider the estimation of population mean and population variance from a stratified sample.
1 Ni
Yi
Ni
y
j 1
ij : population mean of ith stratum
1 ni
yi
ni
y
j 1
ij : sample mean from ith stratum
1 k k
NiYi wY
Ni
Y i i : population mean where wi .
N i 1 i 1 N
unbiased estimator of Y . Consider the stratum mean which is defined as the weighted arithmetic mean
of strata sample means with strata sizes as weights given by
1 k
yst Ni yi .
N i 1
Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur
Page 4
Now
1 k
E ( yst ) Ni E( yi )
N i 1
1 k
Ni Y i
N i 1
Y
Variance of yst
k k ni
Var ( yst ) wi2 Var ( yi ) w w Cov( y , y ).
i j i j
i 1 i ( j ) 1 j 1
Since all the samples have been drawn independently from each of the strata by SRSWOR so
Cov( yi , y j ) 0, i j
Ni ni 2
Var ( yi ) Si
Ni ni
where
1 Ni
Si2
Ni 1 j 1
(Yij Y i ) 2 .
Thus
k
Ni ni 2
Var ( yst ) wi2 Si
i 1 Ni ni
k
n S2
wi2 1 i i .
i 1 Ni ni
Observe that Var( yst ) is small when Si2 is small. This observation suggests how to construct the strata.
If Si2 is small for all i = 1,2,...,k, then Var ( yst ) will also be small. That is why it was mentioned earlier
that the strata are to be constructed such that they are within homogeneous, i.e., Si2 is small and among
heterogeneous.
For example, the units in geographical proximity will tend to be more closer. The consumption pattern
in the households will be similar within a lower income group housing society and within a higher
income group housing society, whereas they will differ a lot between the two housing societies based on
income.
Note: If SRSWR is used instead of SRSWOR for drawing the samples from each stratum, then in this
case
k
yst wi yi
i 1
E ( yst ) Y
k
N 1 k
2
Var ( yst ) wi2 i Si2 wi2 i
i 1 N i ni i 1 ni
k
wi2 si2
(y )
Var st
i 1 ni
1 Ni
where i2
ni
(y
j 1
ij yi ) 2 .
Note: The sample size cannot be determined by minimizing both the cost and variability
simultaneously. The cost function is directly proportional to the sample size, whereas variability is
inversely proportional to the sample size.
Based on different ideas, some allocation procedures are as follows:
Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur
Page 7
1. Equal allocation
Choose the sample size ni to be the same for all the strata.
2. Proportional allocation
For fixed k, select ni such that it is proportional to stratum size Ni , i.e.,
ni N i
or ni N i
where is the constant of proportionality.
k
n
i 1
i
or n N
n
.
N
n
Thus ni Ni .
N
Such allocation arises from considerations like operational convenience.
ni * N i Si
i 1 i 1
k
or n * N i Si
i 1
n
or * k
.
N i Si
i 1
k
This allocation arises when the Var y st is minimized subject to the constraint n
i 1
i (prespecified).
There are some limitations to the optimum allocation. The knowledge of Si (i 1,2,..., k ) is needed to
know ni . If there are more than one characteristics, then they may lead to conflicting allocation.
where
C : total cost
C0 : overhead cost, e.g., setting up the office, training people etc
To find ni under this cost function, consider the Lagrangian function with a Lagrangian
multiplier as
Var ( yst ) 2 (C C0 )
k
1 1 k
wi2 Si2 2 Ci ni
i 1 ni N i i 1
2 2
k
wS k k
w2 S 2
i i 2 Ci ni i i
i 1 ni i 1 i 1 Ni
2
k wS
i i Ci ni terms independent of ni .
i 1
ni
Thus is minimum when
wi Si
Ci ni for all i
ni
1 wi Si
or ni .
Ci
Ci wi Si
or i 1
.
C0*
1 wi Si
Substituting in the expression for ni , the optimum ni is obtained as
Ci
w S C *
ni* i i k 0
.
Ci
Ci wi Si
i 1
The required sample size to estimate Y such that the variance is minimum for the given cost C C0* is
k
n ni* .
i 1
k
wi Si
wi Si Ci
ni i 1 .
Ci k
wi2 Si2
0 N
V
i 1 i
So the required sample size to estimate Y such that cost C is minimum for a
k
prespecified variance V0 is n ni .
i 1
Sample size under proportional allocation for fixed cost and for fixed variance
k
(i) If cost C C0 is fixed then C0 C n .
i 1
i i
n
Under proportional allocation, ni N i nwi
N
k
So C0 n wi Ci
i 1
C0
or n k
.
wC
i 1
i i
Co wi
Thus ni .
wiCi
k
The required sample size to estimate Y in this case is n ni .
i 1
w S 2 2
i i
or n i 1
k
wi2 Si2
V0
i 1 Ni
k
w S 2
i i
2
or ni wi i 1
.
w2 S 2 k
V0 i i
i 1 Ni
This is known as Bowley’s allocation.
N S
i 1
i i
k
1 1
Vopt ( yst ) wi2 Si2
i 1 ni Ni
k
w2 S 2 k w 2 S 2
i i i i
i 1 ni i 1 Ni
k
k N i Si k w2 S 2
wi2 Si2 i 1 i i
i 1 nN i Si i 1 N i
k
1 N i Si k k wi2 Si2
. 2 N i Si
i 1 n N i 1 i 1 N i
2 2
1 k N S k
w2 S 2 1 k 1 k
i i i i wi Si wS i i
2
.
n i 1 N i 1 N i n i 1 N i 1
In order to compare VSRS ( y ) and V prop ( yst ), first we attempt to express S 2 as a function of Si2 .
Consider
k Ni
( N 1) S 2 (Y ij Y )2
i 1 j 1
k Ni 2
(Y ij Yi ) (Yi Y )
i 1 j 1
k Ni k Ni
(Y ij Yi )
2
(Y Y )
i
2
i 1 j 1 i 1 j 1
k k
( N i 1) Si2 N (Y Y ) i i
2
i 1 i 1
Ni 1 N 1
1 and 1.
Ni N
Thus
k
Ni 2 k Ni
S2 Si (Yi Y ) 2
i 1 N i 1 N
N n 2 N n k
Ni 2 N n k
Ni N -n
or
Nn
S
Nn
i 1 N
Si
Nn
i 1 N
(Yi Y ) 2 (Premultiply by
Nn
on both sides)
N n k
VarSRS (Y ) V prop ( y st )
Nn
w (Y Y )
i 1
i i
2
k
Since w (Y Y )
i 1
i i
2
0,
Consider
N n k 2
1 k
2
1 k
V prop ( yst ) Vopt ( yst ) i i i i
wS 2
w S w S i i
Nn i 1 n i 1 N i 1
1 k
2
k
wi Si wi Si
2
n i 1 i 1
1 k 1
n i 1
wi Si2 S 2
n
1 k
n i 1
wi ( Si S ) 2
where
k
S wi Si
i 1
1 ni
si2
ni 1 j 1
( yij yi ) 2 .
In stratified sampling,
k
N i ni 2
Var ( yst ) wi2 Si .
i 1 N i ni
distribution tables. If only few degrees of freedom are provided by each stratum, then t values are
obtained from the table of student’s t-distribution.
i 1 ni 1
Ni ( Ni ni ) k
where gi and Min(ni 1) ne (ni 1) assuming yij are normally distributed.
ni i 1
n1 N1
and
(n N1 ) wi Si
ni k
; i 2,3,..., k
wS
i 2
i i
Suppose in revised allocation, we find that n2 N2 then the revised allocation would be
n1 N1
n2 N 2
( n N1 N 2 ) wi Si
ni k
; i 3, 4,..., k .
wS
i 3
i i
In such cases, the formula for the minimum variance of yst need to be modified as
( * wi Si )2 *
wi Si2
Min Var ( y st )
n* N
where *
denotes the summation over the strata in which ni Ni and n* is the revised total sample
where Qi 1 Pi .
k
Ni ni 2 2
Also Var ( yst ) wi Si .
i 1 Ni ni
1 k
Ni2 ( Ni ni ) PQ
So Var ( pst )
N2
i 1 Ni 1
i i
ni
.
N n 1 k N i2 PQ
Varprop ( pst ) i i
N Nn i 1 N i 1
N n k
wi PQ
Nn i 1
i i
prop ( p ) N n w pi qi .
k
Var st i
Nn i 1 ni 1
The best choice of ni such that it minimizes the variance for fixed total sample size is
N i PQ
ni N i i i
Ni 1
N i PQ
i i
Ni PQ
Thus ni n k
i i
.
N
i 1
i PQ
i i
k
Similarly, the best choice of ni such that the variance is minimum for fixed cost C C0 Ci ni is
i 1
PQ
i i
nN i
Ci
ni k
.
PQ
N
i 1
i
i i
Ci
k Ni 2
(Yij Yi ) (Yi Y )
i 1 j 1
k Ni k
(Yij Y ) 2 N i (Yi Y ) 2
i 1 j 1 i 1
k k
( N i 1) Si2 N i (Yi Y ) 2
i 1 i 1
k
k
( N i 1) Si2 N wiYi 2 Y 2 .
i 1 i 1
In order to estimate S 2 , we need to estimates of Si2 , Yi 2 and Y 2 . We consider their estimation one by
one.
E (si2 ) Si2
So Sˆi2 si2 .
Var ( yi ) E ( yi 2 ) [ E ( yi )]2
E ( yi 2 ) Yi 2
or Yi 2 E ( yi 2 ) Var ( yi ).
An unbiased estimate of Yi 2 is
N n
yi2 i i si2 .
N i ni
Yˆ 2 yst2 Var
(y )
st
k
N n 2 2
yst2 i i wi si .
i 1 N i ni
Substituting these estimates in the expression (n 1)S 2 as follows, the estimate of S 2 is obtained as
k
k
( N 1) S 2 ( N i 1) Si2 N wi Yi 2 Y 2
i 1 i 1
1 k
N k
as Sˆ 2 ( N i 1) Sˆi2 w iYˆi 2 Yˆ 2
N 1 i 1 N 1 i 1
1 k N k N ni 2 2 k N i ni 2 2
N 1 i 1
N i 1 si2 wi yi2 i
N 1 i 1
si yst wi si
N i ni i 1 N i ni
1 k 2 N k k
N ni 2
N 1 i 1
i i
N 1 s i i
N 1 i 1
w ( y y st ) 2
wi (1 wi ) i si .
i 1 N i ni
Thus
SRS ( y ) N n Sˆ 2
Var
Nn
N n k 2 N ( N n) k k
N n
N ( N 1)n i 1
( N i 1) si i i
nN ( N 1) i 1
w ( y y st ) 2
wi (1 wi ) i i si2
i 1 N i ni
and
N i ni 2 2
k
( y )
Var st
i 1 N i ni
wi si .
If any other particular allocation is used, then substituting the appropriate ni under that allocation, such
The subsamples need not necessarily be independent. The assumption of independent subsamples helps
in obtaining an unbiased estimate of the variance of the composite estimator. This is even helpful if the
sample design is complicated and the expression for variance of the composite estimator is complex.
Let there be g independent interpenetrating subsamples and t1, t2 ,..., tg be g unbiased estimators of
1 g
(t )
E Var E (t j ) 2 g ( t ) 2
g ( g 1)
j 1
1 g
Var (t j ) g Var ( t )
g ( g 1) j 1
1
( g 2 g )Var ( t ) Var ( t )
g ( g 1)
If the distribution of each estimator tj is symmetric about , then the confidence interval of can be
obtained by
g 1
1
P Min(t1 , t2 ,..., t g ) Max(t1 , t2 ,..., t g ) 1 .
2
Let Yˆij (tot ) be an unbiased estimator of the total of jth stratum based on the ith subsample ,
i = 1,2,...,L; j = 1,2,...,k.
1 L
(Yˆ )
Var j ( tot ) (Yˆij (tot ) Yˆj (tot ) )2 .
L( L 1) i 1
1 L k
(Yˆij (tot ) Yˆj (tot ) )2 .
L ( L 1) i 1 j 1
Post Stratifications
Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur
Page 22
Sometimes the stratum to which a unit belongs may be known after the field survey only. For example,
the age of persons, their educational qualifications etc. can not be known in advance. In such cases, we
adopt the post-stratification procedure to increase the precision of the estimates.
Note: This topic is to be read after the next module on ratio method of estimation. Since it is related to
the stratification, so it is given here.
In post-stratification,
draw a sample by simple random sampling from the population and carry out the survey.
After the completion of the survey, stratify the sampling units to increase the precision of the
estimates.
Assume that the stratum size Ni is fairly accurately known. Let
m n.
i 1
i
Note that mi is a random variable (and that is why we are not using the symbol ni as earlier).
Assume n is large enough or the stratification is such that the probability that some mi 0 is negligibly
small. In case, mi 0 for some strata, two or more strata can be combined to make the sample size non-
1 1
To find E , proceed as follows :
mi Ni
Consider the estimate of ratio based on ratio method of estimation as
n N
y
yj Y
Y j
Rˆ j 1
n
, R j 1
N
.
x X
x X
j j
j 1 j 1
We know that
ˆ N n RS X2 S XY
E ( R) R . .
Nn X2
1 if j th unit belongs to i th stratum
Let x j
0 otherwise
and
y j 1 for all j = 1,2,...,N.
y j
n
Rˆ j 1
n
x
ni
j
j 1
N
Y
j 1
j
N
R N
X
Ni
j
j 1
1 N 2 2 1 N i2 1 Ni 2
S x2 j X NX i
N N i
N
N 1 j 1 N 1 N 2 N 1 N
1 N 1 N N
S xy X jY j NXY N i i 2 0.
N 1 j 1 N 1 N
n N N ( N n)( N N i )
E ( Rˆ ) R E .
ni Ni nN i2 ( N 1)
Thus
1 1 N N ( N n)( N N i ) 1
E
ni N i nN i n 2 N i2 ( N 1) Ni
( N n) N N 1
1 .
n( N 1) N i N i n n
1 1 ( N n) N N 1
E 1
mi Ni n( N 1) Ni Ni n n
Now substitute this in the expression of Var ( y post ) as
k 1 1 2
Var ( y post ) wi2 E Si
i 1 mi Ni
k N n N N 1
wi2 Si2 . 1
i 1 ( N 1)n Ni nNi n
N n k 2 2 1 1 1
n( N 1) i 1
wi Si 1
wi nwi n
N n k 1
n ( N 1) i 1
2
wi Si2 n 1
wi
N n k
n ( N 1) i 1
2
(nwi 1 wi ) Si2
N n k N n k
i i n2 ( N 1)
n( N 1) i 1
w S 2
i 1
(1 wi ) Si2 .
Assuming N 1 N .
N n n N n n
V ( y post )
Nn i 1
i i n2 N
w S 2
i 1
(1 wi ) Si2
N n n
2
V prop ( yst ) (1 wi ) Si2 .
Nn i 1
The second term is the contribution to the variance of y post due to mi ' s not being proportionately
distributed.
N n k N n 2 k
2 w 1)
(1 wi )Sw2 Sw (k 1) (Since
Nn2
i
Nn i 1 i 1
k 1 N n 2
Sw
n Nn
k 1
Var ( yst ).
n
n
The increase in the variance over Varprop ( yst ) is small if the average sample size n per stratum is
2
reasonably large.
Thus a post-stratification with a large sample produces an estimator which is almost as precise as an
estimator in the stratified sampling with proportional allocation.