Unit-I Basic Concept of Sample Surveys
Unit-I Basic Concept of Sample Surveys
MSE (t ) E (t ) 2 .
The MSE may be considered to be a measure of accuracy with which the estimator t
estimates the parameter .
The expected value of the squared deviation of the estimator from its expected value is
termed sampling variance. It is a measure of the divergence of the estimator from its
expected value and is given by
V (t ) E [t E (t )]2 .
This measure of variability may be termed the precision of the estimator t .
The relation between MSE and sampling variance or between accuracy and precision can be
obtained as
MSE (t ) E (t ) 2 E[t E (t ) E (t ) ] 2
3
there will always be a difference between the parameter and its corresponding estimate. This
error is inherent and unavoidable in any and every sampling scheme. A sample with the
smallest sampling error will always be considered a good representative of the population.
This error can be reduced by increasing the size of the sample (number of units selected in
the sample). In fact, the decrease in sampling error is inversely proportional to the square root
of the sample size and the relationship can be examined graphically as below:
Sample
size
Sampling error
When the sample survey becomes a census survey, the sampling error becomes zero.
Non-sampling error
The non-sampling errors primarily arise at the following stages:
i) Failure to measure some of units in the selected sample
i) Observational errors due to defective measurement technique
iii) Errors introduced in editing, coding and tabulating the results.
Non-sampling errors are present in both the complete enumeration survey and the sample
survey. In practice, the census survey results may suffer from non-sampling errors although
these may be free from sampling error. The non-sampling error is likely to increase with
increase in sample size, while sampling error decreases with increase in sample size.
A procedure for selecting a sample of size n out of a finite population of size N in which
each of the possible distinct samples has an equal chance of being selected is called random
sampling or simple random sampling.
We may have two distinct types of simple random sampling as follows:
i) Simple random sampling with replacement (srswr ) .
ii) Simple random sampling without replacement (srswor ) .
Simple random sampling with replacement (srswr )
In sampling with replacement a unit is selected from the population consisting of N units, its
content noted and then returned to the population before the next draw is made, and the
process is repeated n times to give a sample of n units. In this method, at each draw, each of
1
the N units of the population gets the same probability of being selected. Here the same
N
unit of the population may occur more than once in the sample (order in which the sample
5
units are obtained is regarded). There are N n samples, and each has an equal probability
1
of being selected.
Nn
Note: If order in which the sample units are obtained is ignored (unordered), then in such
case the number of possible samples will be
N
C n N (1 N 1C1 N 1C 2 N 1C n 2 ) .
Simple random sampling without replacement ( srswor )
Suppose the population consist of N units, then, in simple random sampling without
replacement a unit is selected, its content noted and the unit is not returned to the population
before next draw is made. The process is repeated n times to give a sample of n units. In this
method at the r th drawing, each of the N r 1 units of the population gets the same
1
probability of being included in the sample. Here any unit of the population cannot
N r 1
occur more than once in the sample (order is ignored). There are N C n possible samples, and
1
each such sample has an equal probability of being selected.
N
Cn
1 N
Y Yi , population mean.
N i 1
1 n
y yi , sample mean.
n i 1
1 N 1 N
2 (Yi Y ) 2 Yi2 Y 2 , population variance.
N i 1 N i 1
1 N 1 N 2
S2 (Yi Y ) 2 Yi N Y 2 , population mean square.
N 1 i 1 N 1 i 1
1 n 1 n 2 2
s2 ( yi y ) 2 y n y , sample mean square.
n 1 i 1
i
n 1 i 1
6
Theorem: In srswr , the sample mean y is an unbiased estimate of the population mean
N 1 2 2
Y i.e. E ( y ) Y and its variance V ( y ) S .
nN n
Corollary: Yˆ N y is an unbiased estimate of the population total Y with its variance
N 2 2 N ( N 1) 2
V (Yˆ ) S .
n n
Theorem: In srswr , sample mean square s 2 is an unbiased estimate of the population
variance 2 i.e. E ( s 2 ) 2 .
n
2
n n
2
2
E i yi 2 Y i yi Y E i yi Y 2 .
i 1 i 1 i 1
7
Consider,
2
n n n
E i yi i2 E ( yi2 ) i j E ( yi y j ) (1.1)
i 1 i 1 i j
Note that
V ( y i ) E ( y i2 ) Y 2
1 N 1 2
E ( y i2 ) ( N 1) S 2 Y 2 , since V ( yi ) S for each i . (1.2)
N N
Now
N 1 1 N
E ( yi y j ) yi Pr (i ) y j Pr ( j | i ) yi y j .
i j N N 1i j
Note that
2
N N N N
yi yi2 yi y j ( N 1) S 2 N Y 2 yi y j
i 1 i 1 i j i j
N
yi y j N 2Y 2 ( N 1) S 2 N Y 2 .
i j
Thus
1 1
E ( yi y j ) [ N 2Y 2 ( N 1) S 2 N Y 2 ] Y 2 S 2 / N . (1.3)
N N 1
In view of equations (1.2) and (1.3), equation (1.1) becomes
2
n n 1 n S 2
E i yi i2 ( N 1) S 2 Y 2 i j Y 2
N i j N
i 1 i 1
n S2 n 2 n n S2
S2 i2 i Y 2 i2 1 i2 Y 2
i 1 N i 1 i 1 i 1 N
n S2
S2 i2 Y 2 .
i 1 N
Therefore,
n S2
V (T ) S 2 i2 .
i 1 N
2 n
n n
1 1
Since i2 i , under condition i 1 , so that
i 1 i 1 n n i 1
n 1
2
1 1
2
V (T ) S i .
i 1 n n N
8
2
n
1 1
We note that V (T ) will be minimum, if i 0 , where i , for all
i 1 n n
1 n
i 1, 2, , n , and T yi y .
n i 1
OR
To determine i such that V (T ) is minimum, consider the function
n
V (T ) i 1 , where is some unknown constant.
i 1
Using the calculus method of Lagrange multipliers, we select i and the constant to
minimize . Differentiating with respect to i and equating to zero, we have
0 2 S 2 i or i (1.4)
i 2S2
Taking summation on both the sides of (1.4), we get
n n 2S 2
i (1.5)
i 1 2S2 n
Thus, from equations (4) and (5), we have
1 1 n
i , for all i 1, 2, , n , and T yi y .
n n i 1
Case I) Random sampling with replacement
a NPQ
On replacing Y by P , Y by NP , y by p , S 2 by and 2 by PQ in the
n N 1
expressions obtained in expectation and variance of the estimates of population mean and
population total, we find
i) E ( p ) E ( y ) Y P . This shows that sample proportion p is an unbiased estimate of
2 PQ
population proportion P and V ( p) V ( y ) .
n n
ii) E ( Aˆ ) E ( Np ) N E ( p ) NP A , means that Np Aˆ is an unbiased estimate of
NP A and
N 2 2 N 2 PQ
V ( Aˆ ) V (Yˆ ) N 2V ( y ) .
n n
pq PQ
Theorem: Vˆ ( p ) v ( p ) is an unbiased estimate of V ( p ) .
n 1 n
9
1 Z 2 2 S 2 (1.96) 2 85.6
Pr [ | y Y | 1.9] 0.05 , and n0 91.091678 .
20 d2 (1.9) 2
Therefore,
n0
n 75.168 75 .
n0
1
N
Example: In the population of 676 petition sheets. How large must the sample be if the total
number of signatures is to be estimated with a margin of error of 1000, apart from a 1 in 20
chance? Assume that the population mean square to be 229.
Solution: Let Y be the number of signature on all the sheets. Let Yˆ is the estimate of Y .
Margin of error is specified in the estimate Yˆ of Y as
1
| Yˆ Y | 1000 , so that, Pr [ | Yˆ Y | 1000] 0.05 .
20
We know that
2 2
N Z 2 S
676 1.96 229 402.01385
n0
n , here, n0
n d 1000
1 0
N
and hence
10
n 252.09 252 .
Estimation of sample size for proportion
a) When precision is specified in terms of margin of error: Suppose size of the
population is N and population proportion is P . Let a srs of size n is taken and p be
the corresponding sample proportion and d is the margin of error in the estimate p of P
. The margin of error can be specified in the form of probability statement as
Pr [ | p P | d ] or Pr [ | p P | d ] 1 (1.6)
pP
As the population is normally distributed, so y ~ N [ P, V ( p )] , then Z ~ N (0,1)
V ( p)
. For the given value of we can find a value Z of the standard normal variate from
the standard normal table by the following relation:
| p P |
Pr Z 2 or Pr [ | p P | V ( p ) Z 2 ] (1.7)
V ( p )
Comparing equation (1.6) and (1.7), the relation which gives the value of n with the
required precision of the estimate p of P is given by
N n PQ
d Z 2 V ( p) or d 2 Z 2 / 2 V ( p ) Z 2 2 , as sampling is
N 1 n
srswr .
Z 2 2 PQ N n N n Z 2 2 PQ PQ
1 n0 , where n0 (1.8)
d2 n ( N 1) n ( N 1) d2 V ( p)
N 1 N n N N N 1
or 1 1
n0 n n n n0
N N n0 n0 n0
or n (1.9)
N 1 n0 ( N 1) n0 N 1 n
1 1 0
n0 N N N
V ( p) V ( p)
CV ( p) e e2 , or V ( p) e 2 P 2
P 2
P
(1.18)
Substitute equation (1.18) in relation (1.16), we get,
11
PQ Q 1 1
n0 1 , and hence n is given by the relation (1.9).
e2 P2 e2 P e 2 P
Example: In a population of 4000 people who were called for casting their votes, 50%
returned to the poll. Estimate the sample size to estimate this proportion so that the marginal
error is 5% with 95% confidence coefficient.
Solution: Margin of error in the estimate p of P is given by
| p P | 0.05 , then Pr [ | p P | 0.05] 0.05 .
We know that
Z 2 PQ (1.96) 2 0.5 0.5
n0 / 2 384.16 384
d2 0.0025
and hence,
n0
n 350.498 351 .
1 ( n0 / N )
Exercise: In a study of the possible use of sampling to cut down the work in taking
inventory in a stock room, a count is made of the value of the articles on each of 36 shelves
in the room. The values to the nearest dollar are as follows.
29, 38, 42, 44, 45, 47, 51, 53, 53, 54, 56, 56, 56, 58, 58, 59, 60, 60, 60, 60, 61, 61, 61, 62, 64,
65, 65, 67, 67, 68, 69, 71, 74, 77, 82, 85.
The estimate of total value made from a sample is to be correct within $200, apart from a 1 in
20 chance. An advisor suggests that a simple random sample of 12 shelves will meet the
requirements. Do you agree? Yi 2138 , and Yi2 131 682 .
Solution: It is given that
1 2138
2
1 2
S2 iY 2
NY 131 682 36 134.5
N 1 i 36 1 36
and
1
| Yˆ Y | 200 , then, Pr[| Yˆ Y | 200] 0.05 .
20
We know that
2 2
n0 N Z / 2 36 1.96
n , here n0 S 134.5 16.7409
n
1 0 d 200
N
and therefore,
n 11.42765 12 .
Exercise: The selling price of a lot of standing timber is UW , where U is the price per unit
volume and W is the volume of timber on the lot. The number N of logs on the lot is
counted, and the average volume per log is estimated from a simple random sample of n
12
logs. The estimate is made and paid for by the seller and is provisionally accepted by the
buyer. Later, the buyer finds out the exact volume purchased, and the seller reimburses him if
he has paid for more than was delivered. If he has paid for less than was delivered, the buyer
does not mention the fact.
Construct the seller's loss function. Assuming that the cost of measuring n logs is cn , find
the optimum value of n . The standard deviation of the volume per log may be denoted by S
and the fpc ignored.
Solution: Let Ŵ be the estimated total volume of the timber. The error in the estimate
is Wˆ W .
If Wˆ W z 0 sellers loss is zero, i.e. l ( z ) 0 .
0 1 n z 2
L(n) l ( z ) f ( z ) dz (Uz ) exp dz
( NS n) 2 2 N 2S 2
0 1 n z 2
Uz exp dz
( NS n) 2 2 N 2S 2
1 n z 2
0 Uz exp dz
( NS n) 2 2 N 2S 2
n z2 2n z N 2S 2
Put t , then dz dt or z dz dt .
2 N 2S 2 2 N 2S 2 n
Therefore,
2 2
UN S 1 UNS t UNS
L(n) 0 e t dt e dt , as 0 e t dt 1 .
n ( NS n) 2 2 n 0 2 n
To determine the value of n , consider the function
UNS 1 / 2
( n) L ( n) C ( n) c n n .
2
Differentiate this function with respect to n , we get
1 UNS 3 / 2 UNS
0 c n or n 3 / 2 c
n 2 2 2 2
13
2/3
3 / 2 2c 2 UNS
or n or n
.
UNS 2c 2
Exercise: With certain populations, it is known that the observations Yi are all zero on a
portion QN of N units (0 Q 1) . Sometimes with varying expenditure of efforts, these
units can be found and listed, so that they need not be sampled. If 2 is the variance of Yi in
the original population and 02 is the variance when all zeros are excluded, then show that
2 Q
02 Y 2 , where P 1 Q , and Y is the mean value of Yi for the whole
P 2
P
population.
Solution: Given Y1 , Y2 , , Y NP , Y NP 1 , , Y N (first NP units not zero, and rest NQ units
1 N 1 NP
which are all zero). Thus, Y Yi , population mean, and YNP Yi ,
N i 1 NP i 1
1 NQ N NP N NP
YNQ iY 0 , also, Yi Yi , and Yi2 Yi2 ,so that NY NP Y NP ,
NQ i 1 i 1 i 1 i 1 i 1
1
or Y NP Y . By definition,
P
1 N 1 N N
2 (Yi Y ) 2 Yi2 Y 2 , or N 2 Yi2 NY 2 .
N i 1 N i 1 i 1
NP
Similarly, NP 02 Yi2 NP YNP
2
.
i 1
Thus,
1 1 Q
N ( 2 P 02 ) NP YNP
2
NY 2 NP Y 2 NY 2 N 1 Y 2 N Y 2 .
2 P P
P
Therefore,
Q 2 Q
P o2 2 Y 2 or o2 Y 2.
P P 2
P
Exercise: From a random sample of n units, a random sub-sample of n1 units is drawn
without replacement and added to the original sample. Show that the mean based on (n n1 )
units is an unbiased estimator of the population mean, and that ratio of its variance to that of
1 3 n1 / n
the mean of the original n units is approximately , assuming that the population
(1 n1 / n) 2
size is large.
Solution:Let the sample mean based on n , n1 , and n n1 elements are denoted by y n , yn1 ,
1 n 1 n1
and ynn1 respectively, and are defined as yn yi , y n1 yi , and
n i 1 n1 i 1
14
n y n n1 y n1
y n n1 . We have to show E ( y n n1 ) Y , in this case the expectation is taken
n n1
in two stages,
i) when n is fixed
ii) over all expectation
1 1
E ( y n n1 ) E (n y n n1 y n1 ) E [ n y n n1 E ( y n1 n)]
n n1 n n1
1
E ( n y n n1 y n ) , since n1 is a sub-sample of the sample of size n .
n n1
1
(n Y n1 Y ) Y .
n n1
To obtain the variance
2
n y n n1 y n1
2
V ( y n n1 ) E ( y n n1 Y ) E Y
n n1
1
E [n y n n1 y n1 ( n n1 ) Y ] 2
2
(n n1 )
1
E [ n y n n Y n1 y n1 n1 Y ] 2
2
(n n1 )
1
E [n ( y n Y ) n1 y n1 n1 y n n1 y n n1Y ] 2
2
( n n1 )
1
E [( n n1 ) ( y n Y ) n1 ( y n1 y n )] 2
2
( n n1 )
1
[( n n1 ) 2 E ( y n Y ) 2 n12 E ( y n1 y n ) 2 ] , as samples are
2
( n n1 )
drawn independently.
1
[(n n1 ) 2 V ( y n ) n12 E{E ( y n1 y n ) 2 n}]
2
(n n1 )
1 1 1
(n n1 ) 2 V ( y n ) n12 E S n2
(n n1 ) 2 n1 n
1 n n1 2
(n n1 ) 2 V ( y n ) n12 S
(n n1 ) 2 1
n n
1 n (n n1 ) 2 n (n n1 ) 2
(n n1 ) 2 V ( y n ) 1 S V ( yn ) 1 S .
(n n1 ) 2 n n (n n1 ) 2
15
Therefore,
V ( y n n1 ) n1 ( n n1 ) n1 (n n1 )
1 S 2 1 S2
V ( yn ) 2 2 2
n ( n n1 ) V ( y n ) n ( n n1 ) S / n
n 2 3 n1n 1 (3 n1 / n)
.
2
( n n1 ) (1 n1 / n) 2
Exercise: A simple random sample of size n n1 n2 with mean y is drawn from a finite
population, and a simple random subsample of size n1 is drawn from it with mean y1 . Show
that
n ( y y2 ) n22 n2 1 1 2
V ( y1 y ) V 2 1 V ( y1 y 2 ) 2 S
n n2 n 2 n1 n2
n 2 n n2 2 n2 2 n n1 2 1 1 2
2 1 S S S S .
n 2 n1 n2 n1 n n1 n n1 n
iii) Cov ( y , y1 y ) E [ y ( y1 y )] E ( y ) E ( y1 y )
E ( y y1 y 2 ) Y 0 E ( y y1 ) E ( y 2 ) (1)
Consider
16
n y n2 y 2 n n
E ( y y1 ) E 1 1 y1 E 1 y12 2 y1 y 2
n n n
n n
1 E ( y12 ) 2 E ( y1 ) E ( y 2 )
n n
n n n S2 n
1 [ V ( y1 ) Y 2 ] 2 Y 2 1 Y 2 2 Y 2
n n n n1 n
S 2 n1 2 n2 2 S 2
Y Y Y 2 (2)
n n n n
Now
S2
V ( y) E ( y 2 ) Y 2 or E ( y 2 ) V ( y) Y 2 Y 2 (3)
n
In view of equations (1), (2), and (3), we get
S2 S2
Cov ( y , y1 y ) Y 2 Y 2 0.
n n
Exercise: A population has three units U1 ,U 2 and U 3 with variates Y1 ,Y2 and Y3
respectively. It is required to estimate the population total Y by selecting a sample of two
units. Let the sampling and estimation procedures be as follows:
Sample ( s ) P (s) Estimator t Estimator t
(U1 , U 2 ) 1/ 2 Y1 2Y2 Y1 2Y2 Y12
(U1 , U 3 ) 1/ 2 Y1 2Y3 Y1 2Y3 Y12
Prove that both t and t are unbiased for Y and find their variances. Comment on the
estimators.
Solution: By definition
1
E (t ) t i p (ti ) (Y1 2Y2 Y1 2Y3 ) Y .
i 2
This shows that estimator t is unbiased for Y .
1 1
E (t 2 ) [(Y1 2Y2 ) 2 (Y1 2Y3 ) 2 ] (Y12 4Y22 4Y1Y2 Y12 4Y32 4Y1Y3 )
2 2
Y12 2Y22 2Y32 2Y1Y2 2Y1Y3 .
Therefore,
V (t ) E (t 2 ) [ E (t )]2 Y12 2Y22 2Y32 2Y1Y2 2Y1Y3 (Y1 Y2 Y3 ) 2
1
E (t ) ti p (ti ) (Y1 2Y2 Y12 Y1 2Y3 Y12 ) Y , hence, t is unbiased for Y .
i 2
1
E (t 2 ) [(Y1 2Y2 Y12 ) 2 (Y1 2Y3 Y12 ) 2 ]
2
1 4
(Y1 2Y13 Y12 4Y12Y2 4Y1Y2 4Y22 Y14 2Y13
2
Y12 4Y12Y3 4Y1Y3 4Y32 )
UNIT-II
The precision of an estimator of the population parameters (mean or total etc.) depends on the
size of the sample and the variability or heterogeneity among the units of the population. If
the population is very heterogeneous and considerations of cost limit the size of the sample, it
may be found impossible to get a sufficiently precise estimate by taking a simple random
sample from the entire population. For this, one possible way to estimate the population mean
or total with greater precision is to divide the population in several groups (sub-population or
classes, these sub-populations are non-overlapping) each of which is more homogenous than
the entire population and draw a random sample of predetermined size from each one of the
groups. The groups, into which the population is divided, are called strata or each group is
called stratum and the whole procedure of dividing the population into the strata and then
drawing a random sample from each one of the strata is called stratified random sampling.
For example, to estimate the average income per household, it may be appropriate to group
the households into two or more groups (strata) according to the rent paid by the households.
The households in any stratum so form are likely to be more homogeneous with respect to
income as compared to the whole population. Thus, the estimated income per household
based on a stratified sample is likely to be more precise than that based on a simple random
sample of the same size drawn from the whole population.
Principal reasons for stratification
To gain in precision, divide a heterogeneous population into strata in such a way that each
stratum is internally homogeneous.
To accommodate administrative convenience (cost consideration), fieldwork is organized
by strata, which usually results in saving in cost and effort.
To obtain separate estimates for strata.
We can accommodate different sampling plan in different strata.
We can have data of known precision for certain subdivisions treating each subdivision as
a population in its own right.
Notations
Let the population, consisting of N units is first divided into k strata (sub-populations) of
size N1 , N 2 , , N k . These sub-populations are non-overlapping such that
N1 N 2 N k N . A sample is drawn (by the method of srs ) from each stratum
(group or sub-population) independently, the sample size within the i th stratum being ni ,
(i 1, 2, , k ) such that n1 n2 nk n . The following symbols refer to stratum i .
N i , total number of units.
ni , number of units in sample.
ni
fi , sampling fraction in the stratum.
Ni
Ni
Wi , stratum weight.
N
19
y ij , value of the characteristic under study for the j th unit in the i th stratum,
j 1,2, , N i .
N
1 i
Yi yij , mean based on N i units (stratum mean).
N i j 1
n
1 i
yi yij , mean based on ni units (sample mean).
ni j 1
N
1 i
i2 ( yij Yi ) 2 , variance based on N i units (stratum variance).
N i j 1
N
1 i
S i2
N i 1 j 1
( y ij Yi ) 2 , mean square based on N i units (stratum mean square).
n
1 i
si2
ni 1 j 1
( y ij y i ) 2 , sample mean square based on ni units.
k Ni k
Y y ij N i Yi , population total.
i 1 j 1 i 1
Y 1 k k
Y N i Yi Wi Yi , over all population mean.
N N i 1 i 1
Theorem: For stratified random sampling, wor , if in every stratum the sample estimate yi is
an unbiased of Yi , and samples are drawn independently in different strata, then
k
y st Wi y i is an unbiased estimate of the over all population mean Y and its variance is
i 1
k 2 2
1 1
V ( y st ) Wi S i .
i 1 i
n N i
Proof: Since sampling within each stratum is simple random sampling, i.e. E ( y i ) Yi , it
follows that
k k k
E ( y st ) E Wi yi Wi E ( yi ) Wi Yi Y . To obtain the variance, we have
i 1 i 1 i 1
2 2
k k k
V ( y st ) E [ y st E ( y st )] E Wi y i E Wi y i E Wi { y i E ( y i )}
2
i 1 i 1
i 1
20
k 2 k
E Wi { y i E ( yi )}2 E Wi Wi { y i E ( yi )} { y i E ( yi )}
i 1 i, i
i i
k k k
Wi2 V ( y i ) Wi Wi Cov ( yi , yi ) .
i 1 i 1 ii 1
Since samples are drawn independently in different strata, all covariance terms vanishes, then
k k 2 2
1 1
V ( y st ) Wi2 V ( y i ) Wi S i , as srswor within each stratum.
i 1 i
i 1
n Ni
Alternative expressions of V ( y st )
k
1 1 2 2 k N i ni N i2 2 1 k
i) V ( y st ) Wi S i
2 i
S i / ni N ( N i ni ) S i2 / ni .
2
i 1 i i 1 N
n Ni Ni N i 1
1 k
1 k 2 n k (1 f i ) S i2
ii) V ( y st ) i i i i i 2 Ni
N ( N n ) S 2
/ n 1 i S i2 / ni Wi2 .
N 2 i 1 N i 1 Ni i 1
n i
Corollary: Yˆst N y st is an unbiased estimate of the population total Y with its variance
k
1 1 2 2
V (Yˆst ) N i Si .
i 1 i
n N i
Proof: By definition
E (Yˆst ) N E ( y st ) NY Y , and
k 2 2 k 1 2 2
1 1 1
V (Yˆst ) N 2 V ( y st ) N 2 Wi S i N i S i
i 1 i i 1 i
n Ni n Ni
k k
N i ( N i ni ) S i2 / ni N i2 (1 f i ) S i2 / ni .
i 1 i 1
Remarks
n
a) If N i are large as compared to ni (if the sampling fractions f i i are negligible in all
Ni
strata), then,
k
1 k 2 2
i) V ( y st ) Wi2 S i2 / ni N i S i / ni .
i 1 N 2 i 1
k
ii) V (Yˆst ) N i2 S i2 / ni .
i 1
21
n N N
b) If in every stratum i i i.e. ni n i nWi , the variance of y st reduces to
n N N
k N n 2 2 k N nW 1 f k
V ( y st ) i i Wi S i / ni i i Wi S i2 / n Wi Si2 .
i 1 i 1
N i N i n i 1
n N
c) If in every stratum i i , and the variance of y st in all strata have the same value S 2
n N
1 f k 2 1 f 2
k
, then the result reduces to V ( y st ) i
n i 1
W S
n
S , since Wi 1.
i 1
Estimation of variance
If a simple random sample is taken within each stratum, then an unbiased estimator of S i2 , is
n
1 i
si2 ( yij yi ) 2 , and an unbiased estimator of variance y st is
ni 1 j 1
k 2 2
1 1 1 k
Vˆ ( y st ) v ( y st )
2 i
Wi si N ( N i ni ) si2 / ni
i 1 i
n Ni N i 1
k
Wi2 (1 f i ) si2 / ni .
i 1
Alternative form for computing purposes
k W 2s2 k W 2s2 k W 2s2 k W s2
V ( y st ) i i
i i
i i
i i
.
i 1
ni i 1
Ni i 1
ni i 1
N
k
Theorem: If stratified random sampling is with replacement, then y st Wi y i is an
i 1
k
unbiased estimate of population mean Y and its variance is V ( y st ) Wi2 S i2 / ni .
i 1
Note: If the variances in all strata have the same value, S 2 (say), then
1 f 2 k
V ( y st ) prop S , as Wi 1 .
n i 1
k k N
1 1 N n N n k
V ( y st ) prop Wi S i2 2 i i
i
S i2 N S2 .
n N i 1 nN i 1 N nN i 1
Optimum allocation: In this method of allocation the sample sizes ni in the respective
strata are determined with a view to minimize V ( y st ) for a specified cost of conducting the
sample survey or to minimize the cost for a specified value of V ( y st ) . The simplest cost
function is of the form
k
Cost C c0 ci ni , where the overhead cost c0 is constant and ci is the average
i 1
cost of surveying one unit in the i th stratum
23
k
C c 0 ni ci C (say) (2.1)
i 1
k
1 1 2 2 k Wi2 S i2 k W 2S 2
and V ( y st ) Wi S i i i , so that
i 1 i
n Ni i 1
ni i 1
Ni
k W 2S 2 k W 2S 2
V ( y st ) i i i i V (say) (2.2)
i 1
Ni i 1
ni
where C and V are function of ni . Choosing the ni to minimize V for fixed C or C for
fixed V are both equivalent to minimizing the product
k W 2S 2 k
V C i i n c
i i
i 1 ni i 1
It may be minimized by use of the Cauchy-Schwartz inequality, i.e. if ai , bi , i 1,2, , k are
two sets of k positive numbers, then
2
k 2 k 2 k
ai bi ai bi , equality holds if and only if bi is constant for all i .
ai
i 1 i 1 i 1
Taking ai Wi S i / ni 0 , and bi ni ci 0 , then
2
k k k
V C (Wi S i / ni ) ( ni ci ) Wi S i ci .
2 2
i 1 i 1 i 1
2
k
Thus, no choice of ni can make V C smaller than Wi S i ci . This minimum value
i 1
b
occurs when i constant, say .
ai
bi ni ni ci WS
ni ci or ni i i (2.3)
ai Wi S i Wi S i ci
ni Wi S i / ci , this allocation is known as optimum allocation.
Wi S i / ci N i S i / ci
ni n n . (2.4)
k k
Wi S i / ci N i Si / ci
i 1 i 1
24
Alternative method
To determine ni such that V ( y st ) is minimum and cost C is fixed, consider the function
k Wi2 S i2 k Wi2 S i2 k
c0 ci ni C , where is some unknown
ni Ni
i 1 i 1 i 1
constant.
Using the calculus method of Lagrange multipliers, we select ni , and the constant to
minimize . Differentiating with respect to ni , and equating to zero, we have
W 2S 2 1 Wi S i
0 i i ci or ni
ni ni2 ci
(23a)
ni Wi S i / ci or ni N i S i / c i .
Wi S i / ci N i S i / ci
ni n n (2.4a)
k k
Wi S i / ci N i Si / ci
i 1 i 1
The total sample size n required for the optimum sample sizes within strata. The solution for
the value of n depends on whether the sample is chosen to meet a specified total cost C or to
give a specified variance V for y st .
i) If cost is fixed, substitute the optimum values of ni in (cost function) equation (2.1) and
solve for n as
k k Wi S i / ci k Wi S i ci
C c0 ci ni n ci n
k k
i 1 i 1
Wi S i / ci i 1
Wi Si / ci
i 1 i 1
C c0 k
n
k Wi Si / ci .
Wi S i ci i 1
i 1
Hence,
C c0 k Wi S i / ci (C c0 ) Wi S i / ci
ni
k Wi Si / ci
k
k
.
Wi S i ci i 1 Wi S i / ci Wi S i ci
i 1 i 1 i 1
k k
1 ci Wi S i ci 1 Wi2 S i2
V ( y st ) opt
i 1
(C c0 ) Wi S i Ni
i 1
k k 2 2
1 Wi S i ci Wi S i ci Wi S i
C c0
i 1 i 1 Ni
2
1 k k
Wi S i2
Wi Si ci N .
C c0 i 1
i 1
k
Wi2 S i2 Wi S i / ci
1 k 1 k 1 k k
V ( y st ) i i n
N i 1
W S 2
i 1
Wi S i / ci
Wi S i
n i 1
ci Wi S i / ci .
i 1 i 1
Thus,
1 k k
n Wi S i c i
Wi S i / ci , and hence,
1 k i 1
V Wi S i2 i 1
N i 1
1 k
ni (Wi S i / ci ) Wi S i ci .
1 k
V Wi S i2 i 1
N i 1
k k k
(Wi S i / ci ) Wi S i ci Wi S i ci Wi S i ci
k
i 1 i 1
C c 0 ci i 1
k k
i 1 1 1
V Wi S i
N i 1
2
V Wi S i2
N i 1
2
1 k
Wi S i ci .
1 k
V Wi S i2 i 1
N i 1
Remark
An important special case arises if ci c , that is, if the cost per unit is the same in all strata.
k
The cost becomes C c0 c ni c0 cn , and optimum allocation for fixed cost reduces
i 1
to optimum allocation for fixed sample size. The result in this case is as follows:
26
Wi S i N i Si
ni n n ni Wi S i or ni N i S i , and is called Neyman
k k
Wi Si N i Si
i 1 i 1
allocation and V ( y st ) under optimum allocation for fixed n or Neyman allocation.
k k 2 2 k 1 k
V ( y st ) opt
1
Wi S i
1 Wi S i Wi S i Wi S i 1 N i Wi S i2
n Wi S i n
i 1 i 1
i 1 i 1
Ni Ni N
2
1 k 1 k
Wi S i Wi S i2 .
n i 1
N i 1
2
1 k
Note: If N is large, V ( y st ) opt reduces to V ( y st ) opt Wi S i .
n i 1
Relative precision of stratified with simple random sampling
Here, we shall make a comparative study of the usual estimators under simple random
sampling, without stratification and stratified random sampling employing various schemes
of allocation i.e. proportional and optimum allocations. This comparison shows how the gain
due to stratification is achieved.
Consider the variances of these estimators of population mean, which are as follows.
1 f 2
Vran S
n
1 f k 1 k 1 k
V prop
n i 1
Wi S i Wi S i Wi S i2 .
2
n i 1
2
N i 1
2
1 k 1 k
Vopt Wi S i Wi S i2 .
n i 1
N i 1
Now
k Ni k Ni
( N 1) S 2 ( yij Y ) 2 ( yij Yi Yi Y ) 2
i 1 j 1 i 1 j 1
k Ni k Ni k Ni
(Yij Yi ) 2 (Yi Y ) 2 2 ( yij Yi ) (Yi Y )
i 1 j 1 i 1 j 1 i 1 j 1
k k k N i
( N i 1) S i2 N i (Yi Y ) 2 (Yi Y ) (Yij Yi )
2
i 1 i 1 i 1 j 1
27
k k
( N i 1) S i2 N i (Yi Y ) 2 , as sum of the deviations from their
i 1 i 1
mean is zero.
k N 1
2 k Ni
or S 2 i Si (Yi Y ) 2
i 1
N 1 i 1
N 1
For large N ,
1 N i 1 ( N i / N ) (1 / N )
0 , so that, Wi
N N 1 1 (1 / N )
and
Ni (Ni / N )
Wi ,
N 1 1 (1 / N )
so that
k k
S 2 Wi S i2 Wi (Yi Y ) 2 .
i 1 i 1
Hence,
1 f 2 1 f k 2 1 f
k
Vran
n
S i i
n i 1
W S Wi (Yi Y ) 2
n i 1
1 f k
V prop Wi (Yi Y ) 2 V prop positive quantity.
n i 1
Further, consider
2
1 k 1 k 1 k 1 k
V prop Vopt Wi S i2 Wi S i2 Wi S i Wi S i2
n i 1 N i 1 n i 1
N i 1
k 1 k
2
k
2
k
2
1 k
Wi S i Wi Si n Wi Si Wi S i 2 Wi S i
n i 1
2 2
i 1 i 1 i 1 i 1
k
2
k k k
1 k
Wi S i Wi S i Wi 2 Wi Si Wi S i ,
n i 1
2
i 1 i 1 i 1 i 1
k
as Wi 1
i 1
28
2
1 k 2 k k
Wi S i Wi Si 2 Si Wi Si
n i 1
i 1 i 1
2
1 k k
Wi S i Wi S i ve quantity.
n i 1 i 1
2
1 k k
V prop Vopt Wi S i Wi S i .
n i 1 i 1
Thus,
V prop Vopt .
(2.6)
From equation (2.5) and (2.6), we get
Vran V prop Vopt .
Also,
2
1 k k 1 f k
Vran Vopt Wi S i Wi S i
Wi (Yi Y ) 2 .
n i 1 i 1
n i 1
Remark
In comparing the precision of stratified with un-stratified random sampling, it was assumed
that the population values of stratum means and variances were known.
Estimation of the gain in precision due to stratification
It is sometimes of interest to examine, from a survey, whether the mode of stratification has
been effective in estimating the population mean with increased gain in precision relative to
simple random sampling without replacement. The data available from the sample are the
value N i , ni , yi , and si2 . An unbiased estimator of the variance of y st is given by
k k
Vˆ ( y st ) Wi2 si2 / ni Wi si2 / N .
i 1 i 1
The problem is to compare this variance with an unbiased estimate of V ( y sr ) based on the
given stratified sample. For estimation of V ( y sr ) , note that
1 1 N n 2
V ( y sr ) S 2 S .
n N nN
We shall first estimate S 2 , when yi and si2 are available for all the strata. Consider, the
relation
k k k k
( N 1) S 2 ( N i 1) S i2 N i (Yi Y ) 2 ( N i 1) S i2 N Wi (Yi Y ) 2 .
i 1 i 1 i 1 i 1
29
k k
( N i 1) S i2 N Wi Yi 2 Y 2 .
i 1 i 1
1 1 2
V ( y i ) E ( yi Yi ) 2 , Yi 2 E ( y i2 ) V ( y i ) , and Yˆi 2 yi2 si
ni N i
Thus,
k k 1 1 2 2 k
1 1 2 2
( N 1) Sˆ 2 ( N i 1) si2 N Wi yi2 si y st Wi si
i 1 i 1 ni N i i 1 i
n Ni
k k k
1 1 k
1 1 2 2
( N i 1) si2 N Wi yi2 y st2 Wi si2 Wi si
i 1 i 1 i 1 i
n Ni i 1 i
n Ni
k k k 1 1 2
( N i 1) si2 N Wi ( yi y st ) 2 Wi (1 Wi ) si
i 1 i 1 i 1 ni N i
1 k k k 1 1 2
N ( N i 1) si2 Wi ( y i y st ) 2 Wi (1 Wi ) si .
N i 1 i 1 i 1 ni N i
Therefore,
N n N 1 k 1 k 2 k
Vˆ ( y sr ) i i N si Wi ( yi y st ) 2
n N N 1 N i 1
N s 2
i 1 i 1
k k k
Wi (1 Wi ) si2 / ni Wi si2 / N i Wi2 si2 / N i .
i 1 i 1 i 1
Put N i N Wi
N n 1 k 1 k 2 k k
Vˆ ( y sr ) i i N i i i st Wi (1 Wi ) si2 / ni
n ( N 1) N i 1
N W s 2
s W ( y y ) 2
i 1 i 1 i 1
k k
Wi si2 / N Wi Wi2 si2 / N Wi
i 1 i 1
30
N n k k k
1 k
n ( N 1) i 1
Wi s 2
i Wi ( y i y st ) 2
Wi (1 Wi ) s 2
i / ni
N i 1
Wi si2
i 1 i 1
N n 1 k N n k k
1 Wi si i i st i
2 2 2
W ( y y ) W (1 W ) s / n
n ( N 1) i 1
i i i
n ( N 1) N i 1 i 1
N n k N n k k
n N i 1
W s
i i
2
n ( N 1) i 1
Wi ( y i y st ) 2
Wi (1 Wi ) si2 / ni .
i 1
The estimate of the relative gain in precision due to stratification is thus obtained by
Vˆ ( y sr ) Vˆ ( y st )
.
Vˆ ( y )st
Alternative result
N n 1 k 1 k 2 k k
Vˆ ( y sr ) i i N i i i st Wi (1 Wi ) si2 / ni
n ( N 1) N i 1
N W s 2
s W ( y y ) 2
i 1 i 1 i 1
k k
Wi si2 / N Wi Wi2 si2 / N Wi
i 1 i 1
N n k k k k
1 k
n ( N 1) i 1
Wi si2 Wi ( yi y st ) 2 Wi si2 / ni Wi2 si2 / ni Wi si2
N i 1
i 1 i 1 i 1
N n k k
2 1 W 1
Wi ( yi y st ) Wi si 1 i .
2
n ( N 1) i 1 i 1 ni ni N
Exercise: In a population with N 6 and k 2 the values of yij are 0, 1, 2 in stratum
1 and 4, 6, 11 in stratum 2. A sample with n 4 is to be taken.
i) Show that the optimum ni under Neyman allocation are n1 1 and n2 3 .
ii) Compute the estimate y st for every possible sample under optimum allocation and
proportion allocation. Show that the estimates are unbiased. Hence find V ( y st ) directly
under optimum and proportion allocation and verify that V ( y st ) under optimum agrees
2
1 k 1 k k
1 1 2 2
with the formula V ( y st ) Wi S i Wi S i2 Wi S i and
n i 1
N i 1 i 1 i
n Ni
k
1 1
V ( y st ) under proportion agrees with the formula V ( y st ) Wi S i2 .
n N i 1
N
N i Si 1 i
ni n
k
, where S i2 ( yij Yi ) 2 , so that,
N i 1 j 1
N i Si
i 1
3 3
1 1
S12
N1 1 j 1
( y1 j Y1 ) 2 1 , and S 22
N 2 1 j 1
( y 2 j Y2 ) 2 13 .
Therefore,
N1 S1 N S
n1 n 1 , and n2 n 21 2 3 .
N i Si N i Si
i i
Samples Means
I II y1 y2 y st
0 (4, 6, 11) 0 7 3.5
1 (4, 6, 11) 1 7 4.0
2 (4, 6, 11) 2 7 4.5
Samples Means
I II y1 y2 y st
(0, 1) (4, 6) 0.5 5.0 2.75
(0, 1) (4, 11) 0.5 7.5 4.00
(0, 1) (6, 11) 0.5 8.5 4.50
(0, 2) (4, 6) 1.0 5.0 3.00
(0, 2) (4, 11) 1.0 7.5 4.25
(0, 2) (6, 11) 1.0 8.5 4.75
32
1
E ( y st ) (2.75 4.00 4.50 3.00 4.25 4.75 3.25 4.50 5.00) 4 Y
9
Therefore, y st is unbiased estimate of Y under proportion allocation.
1
V ( y st ) [(2.75 4) 2 (4.00 4) 2 (5.00 4) 2 ] 0.583 .
9
By formula
k
1 1
V ( y st ) Wi S i2 0.583 .
n N i 1
Exercise:The households in a town are to be sampled in order to estimate the average amount
of assets per household. The households are stratified into a high-rent and low-rent stratum. A
house in the high-rent stratum is thought to have about nine times as much assets as one in
the low-rent stratum, and Si is expected to be proportional to the square root of the stratum
mean. There are 4000 households in the high-rent stratum and 20, 000 in the low-rent
stratum.
i) Distribute a sample of 1000 households between the two strata.
ii) If the object is to estimate the difference between assets per household in the two strata,
obtain the optimum sample sizes to be distributed in two strata such that n1 n2 1000 .
Solution:
1 5
Given N1 4000 , N 2 20, 000 , W1 , and W2 .
6 6
Also,
Y1 9Y2 , S1 Y1 , S1 A Y1
and S 2 Y2 , S 2 A Y2 .
i) Since total sample size is fixed i.e. n 1000 , then the optimum value (under Neyman
Wi S i
allocation) ni n , so that
k
Wi S i
i 1
W1 S1 1 / 6 (3 A Y2 )
n1 n 1000 375 , and n2 625 .
W1 S1 W2 S 2 1 / 6 (3 A Y2 ) 5 / 6 ( A Y2 )
2 S1 S 22
2
1 1 2 1 1
S1 S 2 terms independent of n1 and n2 .
n1 N1 n2 N 2 n1 n 2
Now our problem is to find n1 and n2 such that variance of the estimate is minimum
subject to condition n1 n2 1000 .
To determine the optimum value of ni , consider the function
S12 S 22
( n1 n 2 1000) . (1)
n1 n2
where is some unknown constant. Using the calculus method of Lagrange multipliers,
we select ni and the constant to minimize .
S2 S2
0 1 1 (2)
n1 n12 n12
S 22 S2
0 2 (3)
n2 n22 n22
S12 S2 S12 n2 S1 n1
2 1 and .
n12 n22 S 22 n22 S 2 n2
S1 3 A Y2
3 S1 3 S 2 , and hence,
S2 A Y2
n1 3 S 2
3 n1 3n2 .
n2 S2
Therefore,
3 n2 n2 1000 n2 250 and n1 750 .
Exercise: A sampler has two strata with relative sizes W1 , W2 . He believes that S1 , S 2 can
be taken as equal but thinks that c2 may be between 2c1 and 4c1 . He would prefer to use
proportional allocation but does not wish to incur a substantial increase in variance compared
with optimum allocation. For a given cost C c1n1 c2 n2 , ignoring the fpc , show that
V ( y st ) prop W1c1 W2 c 2
.
V ( y st ) opt (W1 c1 W2 c 2 ) 2
1 k 1 1
V ( y st ) prop
n i 1
Wi S i2 (W1S12 W2 S 22 ) S 2 , as S1 S 2 S (say), and
n n
W1 W2 1 .
Under proportional allocation
n1 nW1 , and n2 nW2 , then C nW1c1 nW2 c2 n (W1c1 W2 c2 ) . So that
C 1
n , and V ( y st ) prop (W1c1 W2 c 2 ) S 2 .
W1c1 W2 c 2 C
c
i) When 2 2 or c 2 2c1 , then
c1
c1 2c1 3c1
RI 1 1 0.029437 .
2
0.5 ( c1 2c1 ) 0.5 c1 (1 2 ) 2
c
ii) When 2 4 or c 2 4c1 , then
c1
c1 4c1 5c1
RI 1 1 0.11111 .
2
0.5 ( c1 2 c1 ) 0.5 c1 (1 2) 2
Exercise: A sampler proposes to take a stratified random sample. He expects that his field
costs will be of the form ci ni . His advance estimates of relevant quantities for two strata
are as follows:
35
Stratum Wi Si ci
1 0.4 10 4
2 0.6 20 9
n n
i) Find the values of 1 and 2 that minimize the total cost for a given value of V ( y st ) .
n n
ii) Find the sample size required, under this optimum allocation, to make V ( y st ) 1 , if fpc
is ignored.
iii) Obtain the total fixed cost.
Solution:
i) The optimum value of ni for given variance when cost is minimum are given by
Wi S i / ci ni Wi S i / ci
ni n , then
k n k
Wi S i / ci Wi S i / ci
i 1 i 1
n1 W1 S1 / c1 1
n W1 S1 / c1 W2 S 2 / c 2 3
and
n2 W2 S 2 / c 2 2
.
n W1 S1 / c1 W2 S 2 / c 2 3
(Wi S i / ci ) Wi S i ci
ni i .
1
V Wi S i2
N i
ni (Wi S i / ci ) Wi S i ci .
i
W 2 S2 W 2 S2 1
V ( y st ) 1 1 2 2 (W12 S12 W22 S 22 )
n n n
and variance under Neyman allocation (for fixed n ), is
37
2
1 k 1
V ( y st ) opt Wi S i (W1 S1 W2 S 2 ) 2 .
n i 1
2 n
n1 W S
1 1 r (given).
n2 opt W2 S 2
Therefore,
1 1
(W12 S12 W22 S 22 ) (W1 S1 W2 S 2 ) 2
V ( y st ) V ( y st ) opt n 2n
V ( y st ) opt 1
(W1 S1 W2 S 2 ) 2
2 n
2
W1 S1
1 2
(W1 S1 W2 S 2 ) 2 2
(r 1) r 1 .
2 2
W S
(W1 S1 W2 S 2 ) 2 W1 S1
2
(r 1) 2 r 1
1
W2 S 2
k
Exercise: If the cost function is the form C c0 ci ni , where c 0 and ci are known
i 1
numbers, then
i) Show that in order to minimize V ( y st ) for fixed total cost, ni must be proportional to
2/3
Wi2 S i2
.
ci
ii) Find the ni for a sample of size 1000 under the following conditions:
Stratum Wi Si ci
1 0.4 4 1
2 0.3 5 2
3 0.2 6 4
38
Solution:
k W 2 S2 k W 2 S2
i) We have V ( y st ) i i i i
i 1
ni i 1
Ni
k
To determine ni such that V ( y st ) is minimum, and cost C c0 ci ni is fixed
i 1
(given), we consider the function
k Wi2 S i2 k Wi2 S i2 k
c0 ci ni C .
ni Ni
i 1 i 1 i 1
(1)
where is some unknown constant. Using the calculus method of Lagrange multipliers,
we select ni and the constant to minimize .
Differentiating equation (1) with respect to ni , we have
W 2 S2 1 Wi2 S i2 1
0 i i ci (ni ) 1 / 2 ci (ni ) 1 / 2
ni 2
ni 2 2
ni 2
2/3 2/3
2 Wi2 S i2 2
2 2
or (ni ) 32
or ni Wi S i
ci ci
and hence,
2/3 2/3
W 2 S 2 2
ni i i , since is constant.
ci
2/3 2 2 2/3
2 Wi S i
ii) We have ni (2)
ci
Taking summation over all strata, we get
2/3
k
2
2/3 k Wi2 S i2 2
2/3
c n
ni
2/3
(3)
i 1 i 1 i k W 2 S 2
i i
ci
i 1
Substitute equation (3) in equation (2), we get
2/3
n Wi2 S i2
ni .
2/3 c
k W S
2 2 i
ic i
i 1 i
Therefore,
1000
n1 (2.56) 2 / 3 541 , n2 313 , and n3 146 .
2/3 2/3 2/3
(2.56) (1.125) (0.36)
39
N
1 k i 1 k k
Y ij N i i Wi Pi P , over all population proportion
N i 1 j 1
y N P
i 1 i 1
n
1 i
yi yij pi , sample proportion based on ni units
ni j 1
N
1 i
i2
N i j 1
( yij Pi ) 2 Pi Pi2 Pi Qi , stratum variance of proportion based on N i units
N
1 i
Ni
S i2
N i 1 j 1
( yij Pi ) 2
Ni 1
Pi Qi , stratum mean square of proportion based on N i
units
n
1 i ni
si2
ni 1 j 1
( y ij pi ) 2
ni 1
pi qi , sample mean square of proportion based on ni units
Theorem: In stratified random sampling, wor , an unbiased estimate of the over all
k
population proportion is given by p st Wi pi with its variance
i 1
k N ni Pi Qi
V ( p st ) Wi2 i , where pi is the sample estimate of proportion Pi in the
i 1 Ni 1 ni
i th stratum.
Proof: Since sampling within each stratum is simple random sampling, so that E ( pi ) Pi ,
it follows that
k k
E ( p st ) Wi E ( pi ) Wi Pi P . To obtain the variance, we have
i 1 i 1
k k 1 1 Ni
V ( p st ) E [ p st E ( p st )] 2 Wi2 V ( p i ) Wi2 Pi Qi , as
i 1 i 1 n i N i N i 1
sampling is srwor within each stratum.
k N ni Ni k N ni
Wi2 i Pi Qi Wi2 i Pi Qi / ni .
i 1 ni N i Ni 1 i 1 Ni 1
40
1 k p q 1 k W n p q
Proof: E [Vˆ ( p st )] E ( N i ni ) Wi i i E ( N i ni ) i i i i
N i 1 ni 1 N i 1 ni ni 1
1 k W n p q
( N i ni ) i E i i i
ni ni 1
N i 1
1 k W N PQ
N i 1
( N i ni ) i i i i , since E ( si2 ) S i2 with srswor .
ni N i 1
k N n 2 Pi Qi
i i Wi .
i 1
N i 1 ni
\
41
UNIT-III
CLUSTER SAMPLING
In random sampling, it is presumed (to suppose) that the population has been divided into a
finite number of distinct and identifiable units called the sampling units. The smallest units
into which the population can be divided are called the elements of the population, and a
group of such elements is known as a cluster. After dividing the population into specified
cluster (as a simple rule, the number of elements in a cluster should be small and the number
of cluster should be large), the required number of clusters are then obtained either by the
method of equal or unequal probabilities of selection, such procedure, when the sampling
units is a cluster, is called cluster sampling. If the entire area containing the population
under study is subdivided into smaller area segments, and each element in the population is
associated with one and only one such area segment, the procedure is alternatively called
area sampling. There are two main reasons for using cluster as a sampling unit.
i) Usually a complete list of the population units is not available and therefore the use of
individual unit as sampling unit is not feasible.
ii) Even when a complete list of the population units is available, by using cluster as
sampling unit the cost of sampling can be reduced considerably.
For instance, in a population survey it may be cheaper to collect data from all persons in a
sample of households than from a sample of the same number of persons selected directly
from all the persons. Similarly, it would be operationally more convenient to survey all
households situated in a sample of areas such as villages than to survey a sample of the same
number of households selected at random from a list of all households. Another example of
the utility of cluster sampling is provided by crop survey, where locating a randomly selected
farms or plot requires a considerable part the total time taken for the survey, but once the plot
is located, the time taken for identifying and surveying a few neighbouring plots will
generally be only marginal.
1 M
y i. yij , mean per element of the i th cluster.
M j 1
1 N
YN yi. , mean of cluster means in the population of N clusters.
N i 1
1 N M
Y yij , mean per element in the population.
NM i 1 j 1
1 n
yn yi. , mean of cluster means in a sample of n clusters.
n i 1
42
1 n M
y yij , mean per element in the sample.
nM i 1 j 1
1 M
S i2 ( yij yi. ) 2 , mean square between elements within the i th cluster.
M 1 j 1
1 N 2
S w2 S i , mean square within clusters.
N i 1
1 N
S b2 ( yi. YN ) 2 , mean square between cluster means in the population.
N 1 i 1
N M
1
S2 ( yij Y ) 2 , mean square between elements in the population.
NM 1 i 1 j 1
N M
1
E ( yij Y ) ( yik Y )
( yij Y ) ( yik Y )
NM ( M 1) i 1 j k
E ( yij Y ) 2 1 N M
( yij Y ) 2
NM i 1 j 1
N M
( yij Y ) ( yik Y )
i 1 j k
, intracluster correlation coefficient between elements with
( M 1) ( NM 1) S 2
in clusters.
Theorem: A simple random sample, wor , of n clusters each having M elements is drawn
from a population of N clusters, the sample mean y n is an unbiased estimator of population
1 1 1 f 2
mean Y and its variance is V ( y n ) S b2 Sb .
n N n
Proof: We have,
1 n 1 n 1 N
E ( y n ) E yi. E ( yi. ) yi. YN Y .
n
i 1 n i 1 N i 1
1 n n
E ( y . Y ) 2
E ( yi. YN ) ( yi . YN )
n 2 i 1
i N
i i
43
Consider
1 N N 1 2
E ( yi. YN ) 2
N i 1
( y i. Y N ) 2
N
Sb . (3.1)
and
N
1
E ( yi. YN ) ( yi . YN ) ( yi. YN ) ( yi . YN )
N ( N 1) i i
1 N N
( yi. YN ) ( yi . YN ) ( yi. YN )
N ( N 1) i 1 i 1
1 N N N
( yi. YN ) ( yi . YN ) ( yi. YN )
2
N ( N 1) i 1 i 1 i 1
N
1 1
N ( N 1) i 1
( yi. YN ) 2 S b2
N
(3.2)
1 n N 1 2 n
1 1 n ( N 1) 2 n (n 1) 2
V ( yn ) S b S b2 Sb Sb
n 2 i 1 N i i
N n 2 N N
N n 2 1 f 2
S Sb .
nN b n
1 2
Note: For large N , V ( y n ) S .
n b
Alternative expression of V ( y n ) interms of correlation coefficient
Consider the intracluster correlation coefficient between elements within clusters and is
defined as
N M
E ( yij Y ) ( yik Y )
( yij Y ) ( yik Y )
i 1 j k
2
E ( yij Y ) ( M 1) ( NM 1) S 2
N M
( yij Y ) ( yik Y ) (M 1) ( NM 1) S 2 .
i 1 j k
By definition,
1 f 2 1 f N
V ( yn )
n
Sb
n ( N 1) i 1
( y i. Y N ) 2 (3.3)
Consider
44
2 2
N
N
1 M 1 N M
( yi. YN ) M yij M YN 2 ( yij YN )
2 M
i 1 i 1 j 1 M i 1 j 1
1 N M N M
( yij Y ) 2 ( yij Y ) ( yik Y ) , as YN Y (3.4)
M 2 i 1 j 1 i 1 j k
1
[( NM 1) S 2 ( M 1)( NM 1) S 2 ]
2
M
( NM 1) S 2
[1 ( M 1) ] (3.5)
M2
Substitute the values of equation (3.5) in equation (3.3), we get
1 f ( NM 1) S 2
V ( yn ) [1 ( M 1) ] .
n M 2 ( N 1)
1 NM 1 N (M 1 / N ) 1
Note: For large N , 0 , so that (1 f ) 1 , and .
N M 2 ( N 1) NM 2 (1 1 / N ) M
Hence,
S2
V ( yn ) [1 ( M 1) ] .
nM
Corollary: Yˆ NM y n is an unbiased estimate of the population total Y , and its variance
2
1 f 2 2 1 f ( NM 1) S
V (Yˆ ) N 2 M 2 Sb N [1 ( M 1) ]
n n N 1
1 f 2
N 2M S [1 ( M 1) ] , for large N .
n
Estimation of variance V ( y n )
Define,
1 n 1 n 2
sb2
n 1 i 1
( y i. y n ) 2
n 1 i 1
yi. n y n 2 , then
1 n
E ( sb2 )
n 1
E ( yi.2 ) n E ( yn 2 )
i 1
Note that,
N 1 2
E ( y i. 2 ) 2
S b YN . (3.6)
N
and
45
V ( y n ) E ( y n 2 ) Y N 2 , so that
N n 2
E ( yn 2 ) 2
S b YN . (3.7)
nN
1 N 1 2 N n 2 1 nN n N n 2
E ( sb2 ) n Sb n Sb
2
Sb Sb .
n 1 N nN n 1 N
1 f 2
This shows that sb2 is an unbiased estimate of S b2 . Hence v ( y n ) s is an unbiased
n b
1 f 2
estimator of V ( y n ) Sb .
n
Relative efficiency ( RE ) of cluster sampling
In sampling of nM elements from the population by simple random sampling, wor , the
variance of the sample mean y is given by
2
NM nM S 1 f 2 1 f 2
V ( y sr ) S , and V ( y n ) Sb .
NM nM nM n
Thus, the relative efficiency of cluster sampling compared with simple random sampling is
given by
V ( y sr ) S2
RE . This shows that the efficiency of cluster sampling increases as the
V ( yn ) M S 2
b
mean square between clusters means S b2 decreases.
Note: For large N , the relative efficiency of cluster sampling in terms of intracluster
correlation coefficient is given by
V ( y sr ) 1
RE .
V ( y n ) 1 ( M 1)
It can be seen that the relative efficiency depends on the value of , if
i) 0 , then V ( y sr ) V ( y n ) , i.e. both methods are equally precise.
ii) 0 , then V ( y sr ) V ( y n ) , i.e. simple random sampling is more precise.
iii) 0 , then V ( y sr ) V ( y n ) , i.e. cluster sampling is more precise.
46
Est. S 2
Est. ( RE ) , here s 2 will not be a unbiased estimate of S 2 i.e. E ( s 2 ) S 2 ,
M Est. S b2
because a sample of nM elements is not taken randomly from the population of NM
elements. To find unbiased estimate of S 2 , consider
N M N M
( NM 1) S 2 ( yij Y ) 2 ( yij yi. yi. Y ) 2
i 1 j 1 i 1 j 1
N M
[( yij yi. ) 2 ( yi. Y ) 2 2 ( yij yi. ) ( yi. Y )]
i 1 j 1
N M N N
( y ij yi. ) 2 M ( y i. Y ) 2 0 ( M 1) S i2 M ( N 1) S b2
i 1 j 1 i 1 i 1
N ( M 1) S w2 M ( N 1) S b2 . (3.8)
Define,
n M
1 1 n
2
sw ( yij yi. ) 2 , and
n ( M 1) i 1 j 1
sb2 ( y i. y n ) 2 .
n 1 i 1
Consider
1 n M
1 n M n
2
2
sw ij i. n (M 1) ij
n ( M 1) i 1 j 1
( y y ) 2
y 2
M i. , so that
y
i 1 j 1 i 1
1 n M n
2
E (s w ) E ( y ij2 ) M E ( y i2. )
n ( M 1) i 1 j 1 i 1
Note that
V ( y ij ) E ( y ij2 ) Y N2 , then
( NM 1) 2 ( N 1) 2
E ( y ij2 ) S Y N2 . Similarly, we can see, E ( y i2. ) S b Y N2 .
NM N
Therefore,
1 n M ( NM 1)
n
( N 1) 2
2
E (s w ) S 2 Y N2 M S b Y N2
n ( M 1) i 1 j 1 NM i 1
N
47
1 ( NM 1) 2 ( N 1) 2
nM S nM Y N2 nM S b nM Y N2
n ( M 1) NM N
1
[( NM 1) S 2 M ( N 1) S b2 ]
N ( M 1)
1
[ N ( M 1) S w2 ] S w2 , by using relation, which is given in
N ( M 1)
equation (3.8).
and
1 2
[ N ( M 1) s w NM (1 1 / N ) sb2 ] ( M 1) s 2 M s 2
NM w
b
.
2 2 2
M sb M sb
Estimation of
1
For large N , RE E (say), so that
1 ( M 1)
2
( M 1) s w M sb2
Eˆ ( M 1) Eˆ ˆ 1 , where Eˆ
M 2 sb2
1 2
1 [( M 1) s w M sb2 ]
1 Eˆ M 2 sb2 M 2 sb2 ( M 1) s w
2
M sb2
or ˆ
( M 1) Eˆ 1 2
( M 1) [( M 1) s w M sb2 ]
( M 1) 2
[( M 1) s w M sb2 ]
M 2 s2
b
M ( M 1) sb2 ( M 1) s w
2
M sb2 s w
2
.
2
( M 1) [( M 1) s w M sb2 ] 2
( M 1) s w M sb2
48
Alternative method
We have,
1 N M
( yij Y ) ( yik Y )
M 1 i 1 j k
,and ( NM 1) S 2 N ( M 1) S w2 M ( N 1) S b2
2
( NM 1) S
.
Note that, from equation (3.4)
N N M N M
M 2 ( yi. YN ) 2 ( yij Y ) 2 ( yij Y ) ( yik Y )
i 1 i 1 j 1 i 1 j k
N M N N M
or ( yij Y ) ( yik Y ) M 2 ( yi. Y ) 2 ( yij Y ) 2
i 1 j k i 1 i 1 j 1
M 2 ( N 1) S b2 ( NM 1) S 2 M 2 ( N 1) S b2 N ( M 1) S w2 M ( N 1) S b2
M ( N 1) S b2 ( M 1) N ( M 1) S w2 .
Hence,
M ( N 1) S b2 N S w2
.
M ( N 1) S b2 N ( M 1) S w2
M ( N 1) sb2 N s w
2
M sb2 s w
2
ˆ , and for large N , ˆ .
M ( N 1) sb2 N ( M 1) s w
2
M sb2 ( M 1) s w
2
( NM 1) S 2 N ( M 1) aM g
Sb2 S 2 ( M 1) aM g 1 , for large N .
M ( N 1)
Thus, the variance V ( y n ) for large N , reduces as
1 2
V ( yn ) [ S ( M 1) a M g 1 ] .
n
The problem is to determine n and M such that for specified cost, the variance of y n is a
minimum. Using calculus methods we form
V ( y n ) (c1nM c2 n C ) ,
where is an unknown constant. Differentiating with respect to n and M respectively, and
equating the results to zero, we obtain
1 c
0 [ S 2 ( M 1) a M g 1 ] c1M 2 , so that
n n 2
2 n
1 c
V ( y n ) c1M 2
n 2 n
(3.9)
and
0 V ( y n ) c1n , so that
M M
V ( y n ) c1n .
M
(3.10)
On eliminating from equation (3.9) and (3.10), we have
1 c1n
V ( yn ) or
1 M c 2
V ( yn ) c1M
n 2 n
1 c1
V ( yn )
V ( y n ) M c2
c1M 1
2
c 1 M n
50
M 1
or V ( yn )
V ( y n ) M c2
1
2c1M n
c 2 c 22 4 c1M C 4 c1 M C 4 c1 MC
n or 2 c1M n c 2 c 2 1
c2 1 1
2 c1M 2 2
c2 c2
Hence,
1 / 2
1 4 c MC
1 1
M
V ( yn ) 1. (3.11)
V ( y n ) M c2 c 22
1
4 c MC
c2 1 1 1
2
c2
Now, solve LHS of equation (6.11), we have
M M
V ( yn ) [ S 2 ( M 1) a M g 1 ]
V ( y n ) M n V ( y n ) M
1
[ agM g a ( g 1) M g 1 ] .
nV ( yn )
Therefore,
1 / 2
aM g 1 [ gM ( g 1)] 4 c MC
1 1 1 (3.12)
nV ( yn ) c 22
It is difficult to get an explicit expression for M . However, M can be obtained by the
iterative method (trial and error method). On substituting the value of M thus obtained in
equation (3.12), we can obtain the optimum value of n .
It is evident from equation (3.12) that the optimum size of the unit becomes smaller when
i) c1 increases i.e. time of measurement increases.
ii) c2 decreases i.e. travel become cheaper.
iii) total cost of survey C increases.
Cluster sampling for proportion
If it is desired to estimate the proportion P of elements belonging to a specified category A
when the population consists of N clusters, each of size M and a random sample, wor , of
n clusters is selected. Defining yij as 1 if the j th element of the i th cluster belongs to
M
the class A and 0 otherwise, it is easy to note that ai y ij gives the total number of
j 1
51
a
elements in the i th cluster that belong to class A , and pi i is the proportion in the i
M
th cluster. Hence the proportion P is
1 N M 1 N 1 N
P ij NM i N pi .
NM i 1 j 1
y a
i 1 i 1
n
1
An unbiased estimate of P is Pˆ pi p
n i 1
and
N
1 1 1 N n N
V ( p)
n N N 1 i 1
( pi P) 2
2 ( p i P ) 2 , for large N .
N n i 1
n
1 1 1
As an estimate of V ( p ) we may use Vˆ ( p )
n N n 1 i 1
( pi p ) 2 .
Alternatively, if we take a simple random sample, wor of nM elements from the population
NM nM PQ n PQ
of size, NM , the variance of p is V ( p ) 1 , for large N .
NM 1 nM N nM
M
1 i
y i.
M i j 1
y ij , mean per element of the i th cluster.
1 N
YN yi. , mean of the cluster means in the population of N clusters.
N i 1
1 n
y n y i. , mean of the cluster means in the sample of n clusters.
n i 1
N Mi
1 1 N
Y
N yij M M i yi. , mean per element in the population.
0 i 1
M i i 1 j 1
i 1
52
1 N M
M
N i 1
M i 0 , mean of cluster size.
N
Three estimators of population mean Y , that are in common use may be considered.
1 n
1st estimate: It is defined by the sample mean of clusters means as y I y i. y n .
n i 1
By definition,
1 n 1 N
E ( y I ) E yi. yi. Y N Y , as the sampling is sr .
n
i 1 N i 1
Thus, y I is biased estimator of the population mean Y .
The bias of the estimator is given as
1 N 1 N 1 N 1 N
B E( y I ) Y
N i 1
y i.
M 0 i 1
M i y i. y i.
N i 1
M i y i.
N M i 1
1 N N N
M y i. M i y i. 1 ( M i M ) y i .
NM NM i 1
i 1 i 1
1 N
( M i M ) ( y i. Y N Y N )
N M i 1
1 N 1 N
i ( M M ) ( y i. Y N
) (M i M ) YN
NM i 1 N M i 1
1
Cov ( y i. , M i ) .
M
This shows that bias is expected to be small when M i and yi. are not highly correlated. In
such a case, it is advisable to use this estimator.
Its variance is given by
1 f 2 1 N
V ( yI ) E ( y I YN ) 2
n
Sb , where Sb2
N 1 i 1
( yi. YN ) 2
1 f 2 1 n
v( y I )
n b
s , where sb2
n 1 i 1
( y i. y I ) 2 .
1 n
2nd estimate: It is defined as y II M i y i. .
nM i 1
By definition,
53
1 n 1 1 N 1 N
E ( y II ) i i. M N i i . NM M i y i. Y ,
nM i 1
E ( M y ) M y as srwor .
i 1 i 1
1 n 1 n M i y i.
nM i i.
V ( y II ) V M y V .
n M
i 1 i 1
Define, a variate
M i y i.
ui , i 1, 2, , N .
M
Let u and U be the sample and population means of variable u , respectively, where,
1 n M i y i. 1 N M i y i. 1 N
u
n i 1 M
y II , and U
N i 1 M
M i y i. Y .
M 0 i 1
Therefore,
1 f 2
V ( y II ) V (u ) S b , as clusters are randomly drawn wor .
n
2
1 N 1 N M i y i.
where, S b 2
N 1 i 1
(u i U ) 2
N 1 i 1 M
Y
and an unbiased estimator of V ( y II ) is
2
1 f 2 1 n M i y i.
v( y II )
n
su , where su2
n 1 i 1 M
y II .
n
1
3rd estimator: It is defined as y III M i y i. . This estimate is a ratio estimate of
M i i1
i
1
the form Rˆ yi , and its variance is given by replacing xi by M i and yi by M i yi.
xi i
i
N
1 f
in the variance of ratio estimator, where, V ( Rˆ )
2 ( y i R xi ) 2 , and
n ( N 1) X i 1
2
1 N
X 2
M i M 2 . Hence,
N
i 1
2
1 f N 1 N
2 i i. M i y i. M i
V ( y III ) M y
N
n ( N 1) M i 1
Mi
i 1
i 1
54
N
1 f
2
( M i yi. Y M i ) 2
n ( N 1) M i 1
2
1 f N Mi 1 f
n ( N 1) i 1 M
( y i. Y )
n
S b 2 ,
2
1 N Mi
where S b 2
N 1 i 1 M
( y i. Y ) .
An unbiased estimate of V ( y III ) is given by
n M 2
1 f 2 1 i
v ( y III )
n b
s , where sb 2
(n 1) i 1 M
( y i. y III ) .
Cluster sampling with varying probabilities and with replacement
Theorem: If a sample of n clusters is drawn with probabilities proportional to size, i.e.
M
pi M i or pi i and with replacement, then an unbiased estimate of Y is given by
M0
1 n 1 N M
yn
n i 1
yi. with variance V ( y n ) i ( yi. Y ) 2 .
n i 1 M 0
Proof: By definition,
1 n 1 n 1 n N 1 N
E ( y n ) E y i. E ( y i. ) p i y i.
M 0 i i.
M y Y .
n n n
i 1 i 1 i 1 i 1 i 1
V ( y n ) E [ y n E ( y n )] 2 E ( y n2 ) Y 2 . (3.13)
Consider
2
1 n 1 n n n
E ( y n2 ) E y i. E ( y 2
. ) E ( yi. ) E ( yi . )
n n 2 i 1
i
i 1 i 1 i i 1
1 N M i 2
n yi. n (n 1) Y 2 , since i th cluster is drawn with
n 2 i 1 M 0
Mi
probability , and sampling of clusters are wr , i.e. E ( y i. ) Y E ( y i. ) .
M0
1 N M i 2
n i 1 M 0
yi. (n 1) Y 2 .
(3.14)
In view of equations (3.14) and (3.13), we get
55
1 N Mi 2 1 N M 1
V ( yn )
n i 1 M 0
yi. (n 1) Y 2 Y 2 i ( yi. Y ) 2 b2 , (say).
n i 1 M 0 n
Estimation of V ( y n )
Define,
1 n
sb2 ( yi. y n ) 2 , then
n 1 i 1
1 n 1 n N M i 2
E sb2 i.
n 1 i 1
E ( y 2
) n E ( y 2
)
n
n 1
M i. y n V ( y n ) n Y 2
i 1 i 1 0
1 N M i 2
n y i. n Y 2 n V ( y n )
n 1 i 1 M 0
1 N M i 2
1 (n 2 2 ) 2 .
n ( y i. Y ) 2 n b
n 1 i 1 M 0 n n 1 b b b
1
This shows that sb2 is an unbiased estimate of b2 . Therefore, Vˆ ( y n ) sb2 is an unbiased
n
estimate of V ( y n ) b2 / n .