0% found this document useful (0 votes)
127 views

Unit-I Basic Concept of Sample Surveys

This document discusses sample surveys and key concepts related to sampling. It defines a sample survey as using a subset of a population to make inferences about the whole population. Sample surveys are commonly used by governments and businesses to gather information when examining the entire population is not feasible. The document outlines important sampling terms like population, sampling unit, sampling frame, sample, random sample, and estimator. It also distinguishes between sampling error and non-sampling error and how increasing sample size can decrease sampling error.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views

Unit-I Basic Concept of Sample Surveys

This document discusses sample surveys and key concepts related to sampling. It defines a sample survey as using a subset of a population to make inferences about the whole population. Sample surveys are commonly used by governments and businesses to gather information when examining the entire population is not feasible. The document outlines important sampling terms like population, sampling unit, sampling frame, sample, random sample, and estimator. It also distinguishes between sampling error and non-sampling error and how increasing sample size can decrease sampling error.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

UNIT-I

BASIC CONCEPT OF SAMPLE SURVEYS

A sample survey is a method of drawing an inference about the characteristics of a population


or universe by observing a part of the population. For example, when one has to make an
inference about a large lot and is not practicable to examine each individual member of the
lot, one always takes help of sample surveys, that is to say one examines only a few member
of the lot and, on the basis of this sample information, one makes decisions about the whole
lot. Thus, a person wanting to purchase a basket of oranges may examine a few oranges from
the basket and on that basis make his decision about the whole basket.
Such methods are extensively used by government bodies throughout the world for assessing,
different characteristics of national economy as are required for taking decisions regarding
the impositions of taxes, fixation of prices and minimum wages etc. and for planning and
projection of future economic structure, for estimation of yield rates and acreages under
different crops, number of unemployed persons in the labour forces, construction of cost of
living indices for persons in different professions and so on.
Sample survey techniques are extensively used in market research surveys for assessing the
preferential pattern of consumers for different types of products, the potential demand for a
new product which a company wishes to introduce, scope for any diversification in the
production schedule, and so on.
Thus, sampling may become unavoidable because we may have limited resources in terms of
money and / or man hours, or it may be preferred because of practical convenience.
Sampling is first broadly classified as Subjective and Objective.
Any type of sampling which depends upon the personal judgment or discretion of the sampler
himself is called Subjective. But the sampling method which is fixed by a sampling rule or is
independent of the sampler’s own judgment is Objective sampling.
Objective Sampling

Non-probabilistic Probabilistic and mixed

In non-probabilistic objective sampling, there is a fixed sampling rule but there is no


probability attached to the mode of selection, e.g. selecting every 5  th individual from a list.
If, however, the selection of the first individual is made in such a manner that each of the first
10 gets an equal chance of being selected, it becomes a case of mixed sampling, if for each
individual there is a definite pre-assigned probability of being selected, the sampling is said
to be probabilistic.
Elementary unit or simply unit: It is an element or a group of elements, on which
observations can be made or from which the required statistical information can be
ascertained according to a well defined procedure, examples of unit are person, family,
household, farm, factory, tree, a period of time such as an hour, day etc.
Population: The collection of all units of a specified type in a given region at a particular
point or a period of time is termed as a population or inverse. For example, a population of
persons, families, farms, cattle, houses or automobiles in a region or a population of trees or a
birds in a forest etc.
2

A population is said to be finite population or an infinite population according to as the


number of units in it is finite or infinite.
Sampling units: Elementary units or groups of such units, which, besides being clearly
defined, identifiable and observable, are convenient for purposes of sampling, are called
sampling units. For example, in a family budget enquiry, usually a family is considered as a
sampling unit, since it is formed to be convenient for sampling for ascertaining the required
information. In a crop survey, a farm or a group of farms owned or operated by a household
may be considered as the sampling units.
Sampling frame: For using sampling methods in the collection of data, it is essential to have
a frame of all the sampling units belonging to the population to be studied with their proper
identification particulars and such a frame is called the sampling frame. This may be a list of
units with their identification particulars.
As the sampling frame forms the basic material from which a sample is drawn, it should be
insured that the frame contains all the sampling units of the population under consideration
but excludes units of any other population.
Sample: A sample is a subset of a population selected to obtain information concerning the
characteristics of the population. In other words, one or more sampling units selected from a
population according to some specified procedure are said to constitute a sample.
Random sample: A random or probability sample is a sample drawn in such a manner that
each unit in the population has a predetermined probability of selection.
Estimator: An estimator is a statistic obtained by a specified procedure for estimating a
population parameter. The estimator is a random variable, as its value differs from sample to
sample and the samples are selected with specified probabilities.
The particular value, which the estimator takes for a given sample, is known as an estimate.
The difference between the estimator (t ) and the parameter ( ) is called error.
An estimator (t ) is said to be unbiased estimator for the parameter ( ) if, E (t )   ,
otherwise biased. Thus bias is given by
E (t   )  B (t )
The mean of squares of error taken from  is called mean square error (MSE ) .
Mathematically it is defined as

MSE (t )  E (t   ) 2 .
The MSE may be considered to be a measure of accuracy with which the estimator t
estimates the parameter  .
The expected value of the squared deviation of the estimator from its expected value is
termed sampling variance. It is a measure of the divergence of the estimator from its
expected value and is given by

V (t )  E [t  E (t )]2 .
This measure of variability may be termed the precision of the estimator t .
The relation between MSE and sampling variance or between accuracy and precision can be
obtained as

MSE (t )  E (t   ) 2  E[t  E (t )  E (t )   ] 2
3

 E[t  E (t )]2  [ E (t )   ]2  V (t )  [ B(t )]2 , since E [t  E (t )]  0 .


This shows that MSE of t is the sum of the sampling variance and the square of the bias.
However, if t is an unbiased estimator of  , the MSE and sampling variance are the same.
The square root of the sampling variance is termed as standard error of the estimator t .
The ratio of the standard error of the estimator to the expected value of the estimator is
known as relative standard error or the coefficient of variation of the estimator.
Sample space: The collection of all possible sample, sequence, sets is called the sample
space.
Sampling design: The combination of the sample space and the associated probability
measure is called a sampling design. For example, let N  4 , n  2 and the probability of
selection for different samples is
Sample (1, 2) (1, 3) (1, 4) ( 2, 3) (2, 4) (3, 4)
Probability 1/ 6 1/ 6 1/ 6 1/ 6 1/ 6 1/ 6

The above table gives the sampling design.


Sampling and complete enumeration
The total count of all units of the population for a certain characteristics is known as complete
enumeration, also termed census survey. The money, man-power and time required for
carrying out complete enumeration will generally be large and there are many situations with
limited means where complete enumeration will not be possible, where recourse to selection
of a few units will be helpful. When only a part, called sample, is selected from the
population and examine, it is called sample enumeration or sample survey.
A sample survey will usually be less expensive then a census survey and the desired
information will obtain in less time. This does not imply that economy is the only
consideration in conducting a sample survey. It is most important that a degree of accuracy of
results is also maintained. Occasionally, the technique of sample survey is applied to verify
that the results obtained from the census surveys. The main advantages or merits of sample
survey over census survey may be outlined as follows:
i) Reduced cost of survey,
ii) Greater speed of getting results,
iii) Greater accuracy of results,
iv) Greater scope, and
v) Adaptability
Sample survey has its own limitations and the advantages of sampling over complete
enumeration can be derived only if
i) the units are drawn in a scientific manner
ii) an appropriate sampling technique is used, and
iii) the size of units selected in the sample is adequate.
Sampling and non-sampling errors
The error which arises due to only a sample (a part of population) being used to estimate the
population parameters and draw inferences about the population is termed sampling error or
sampling fluctuation. Whatever may be the degree of cautiousness in selecting a sample;
4

there will always be a difference between the parameter and its corresponding estimate. This
error is inherent and unavoidable in any and every sampling scheme. A sample with the
smallest sampling error will always be considered a good representative of the population.
This error can be reduced by increasing the size of the sample (number of units selected in
the sample). In fact, the decrease in sampling error is inversely proportional to the square root
of the sample size and the relationship can be examined graphically as below:

Sample
size

Sampling error

When the sample survey becomes a census survey, the sampling error becomes zero.
Non-sampling error
The non-sampling errors primarily arise at the following stages:
i) Failure to measure some of units in the selected sample
i) Observational errors due to defective measurement technique
iii) Errors introduced in editing, coding and tabulating the results.
Non-sampling errors are present in both the complete enumeration survey and the sample
survey. In practice, the census survey results may suffer from non-sampling errors although
these may be free from sampling error. The non-sampling error is likely to increase with
increase in sample size, while sampling error decreases with increase in sample size.

SIMPLE RANDOM SAMPLING

A procedure for selecting a sample of size n out of a finite population of size N in which
each of the possible distinct samples has an equal chance of being selected is called random
sampling or simple random sampling.
We may have two distinct types of simple random sampling as follows:
i) Simple random sampling with replacement (srswr ) .
ii) Simple random sampling without replacement (srswor ) .
Simple random sampling with replacement (srswr )
In sampling with replacement a unit is selected from the population consisting of N units, its
content noted and then returned to the population before the next draw is made, and the
process is repeated n times to give a sample of n units. In this method, at each draw, each of
1
the N units of the population gets the same probability of being selected. Here the same
N
unit of the population may occur more than once in the sample (order in which the sample
5

units are obtained is regarded). There are N n samples, and each has an equal probability
1
of being selected.
Nn
Note: If order in which the sample units are obtained is ignored (unordered), then in such
case the number of possible samples will be
N
C n  N (1 N 1C1  N 1C 2    N 1C n  2 ) .
Simple random sampling without replacement ( srswor )
Suppose the population consist of N units, then, in simple random sampling without
replacement a unit is selected, its content noted and the unit is not returned to the population
before next draw is made. The process is repeated n times to give a sample of n units. In this
method at the r  th drawing, each of the N  r  1 units of the population gets the same
1
probability of being included in the sample. Here any unit of the population cannot
N  r 1
occur more than once in the sample (order is ignored). There are N C n possible samples, and
1
each such sample has an equal probability of being selected.
N
Cn

Theory of simple random sampling with replacement


N , population size.
n , sample size.
Yi , value of the i  th unit of the population.
yi , value of the i  th unit of the sample.
N
Y  Yi , population total.
i 1

1 N
Y   Yi , population mean.
N i 1

1 n
y  yi , sample mean.
n i 1

1 N 1 N
2   (Yi  Y ) 2   Yi2  Y 2 , population variance.
N i 1 N i 1

1 N 1  N 2 
S2   (Yi  Y ) 2    Yi  N Y 2  , population mean square.
N  1 i 1 N  1  i 1 

1 n 1  n 2 2 
s2   ( yi  y ) 2   y  n y  , sample mean square.
n  1  i 1
i
n  1 i 1 
6

Theorem: In srswr , the sample mean y is an unbiased estimate of the population mean
N 1 2  2
Y i.e. E ( y )  Y and its variance V ( y )  S  .
nN n
Corollary: Yˆ  N y is an unbiased estimate of the population total Y with its variance
N 2 2 N ( N  1) 2
V (Yˆ )   S .
n n
Theorem: In srswr , sample mean square s 2 is an unbiased estimate of the population
variance  2 i.e. E ( s 2 )   2 .

Theory of simple random sampling without replacement


Theorem: In srswor , sample mean y is an unbiased estimate of the population mean
 N n 2
Y i.e. E ( y )  Y and its variance is V ( y )   S .
 nN 
Corollary: Yˆ  N y is an unbiased estimate of the population total Y with its variance
V (Yˆ )  N 2 (1  f ) S 2 / n .

Theorem: In srswor , sample mean square s 2 is an unbiased estimate of the population


mean square S 2 i.e. E ( s 2 )  S 2 .
Property: V ( y ) under srswor is less than the V ( y ) under srswr .
Theorem: Let srswor sample of size n is drawn from a population of size N . Let
n
T   i yi is a class of linear estimator of Y , where  i ' s are coefficient attached to
i 1
sample values, then,
n
i) The class T is linear unbiased estimate class if  i  1 .
i 1
ii) The sample mean y is the best linear unbiased estimate.
Proof:
 n  n n n
i) E (T )  E    i yi     i E ( yi )    i Y  Y , iff   i  1 .
 i 1  i 1 i 1 i 1
2
 n  n
ii) V (T )  E    i yi  Y  , under   i  1 .
 i 1  i 1

 n 
2
 n    n 
2
    2
 E    i yi   2 Y    i yi   Y  E    i yi   Y 2 .
  
 i 1   i 1    i 1 
 
7

Consider,
2
 n  n n
E    i yi     i2 E ( yi2 )    i j E ( yi y j ) (1.1)
 i 1  i 1 i j

Note that
V ( y i )  E ( y i2 )  Y 2
1 N 1 2
 E ( y i2 )  ( N  1) S 2  Y 2 , since V ( yi )  S for each i . (1.2)
N N
Now
N 1 1 N
E ( yi y j )   yi Pr (i ) y j Pr ( j | i )   yi y j .
i j N N 1i j

Note that
2
N  N N N
  yi    yi2   yi y j  ( N  1) S 2  N Y 2   yi y j
 
 i 1  i 1 i j i j
N
  yi y j  N 2Y 2  ( N  1) S 2  N Y 2 .
i j

Thus
1 1
E ( yi y j )  [ N 2Y 2  ( N  1) S 2  N Y 2 ]  Y 2  S 2 / N . (1.3)
N N 1
In view of equations (1.2) and (1.3), equation (1.1) becomes
2
 n  n 1  n  S 2 
E    i yi     i2  ( N  1) S 2  Y 2     i j  Y 2 
N  i j  N 
 i 1  i 1 

n S2 n 2 n  n  S2 
 S2   i2    i  Y 2   i2  1    i2   Y 2  
i 1 N i 1 i 1  i 1   N 

n S2
 S2   i2  Y 2  .
i 1 N
Therefore,
n S2
V (T )  S 2   i2  .
i 1 N
2 n
n n
 1 1
Since   i2     i    , under condition   i  1 , so that
i 1 i 1 n n i 1

n 1
2
 1 1 
2
V (T )  S    i       .

i 1 n  n N 
8

2
n
 1 1
We note that V (T ) will be minimum, if   i    0 , where  i  , for all
i 1 n n
1 n
i  1, 2,  , n , and T   yi  y .
n i 1
OR
To determine  i such that V (T ) is minimum, consider the function

 n 
  V (T )      i  1 , where  is some unknown constant.
 i 1 
Using the calculus method of Lagrange multipliers, we select  i and the constant  to
minimize  . Differentiating  with respect to  i and equating to zero, we have
 
 0  2 S 2 i   or i   (1.4)
 i 2S2
Taking summation on both the sides of (1.4), we get
n n 2S 2
 i     (1.5)
i 1 2S2 n
Thus, from equations (4) and (5), we have
1 1 n
i  , for all i  1, 2,  , n , and T   yi  y .
n n i 1
Case I) Random sampling with replacement
a NPQ
On replacing Y by P , Y by NP , y by p  , S 2 by and  2 by PQ in the
n N 1
expressions obtained in expectation and variance of the estimates of population mean and
population total, we find
i) E ( p )  E ( y )  Y  P . This shows that sample proportion p is an unbiased estimate of
2 PQ
population proportion P and V ( p)  V ( y )   .
n n
ii) E ( Aˆ )  E ( Np )  N E ( p )  NP  A , means that Np  Aˆ is an unbiased estimate of
NP  A and

N 2 2 N 2 PQ
V ( Aˆ )  V (Yˆ )  N 2V ( y )   .
n n
pq PQ
Theorem: Vˆ ( p )  v ( p )  is an unbiased estimate of V ( p )  .
n 1 n
9

Case II) Random sampling without replacement


Results are:
i) E ( p )  E ( y )  Y  P . This shows that sample proportion p is an unbiased estimate of
N  n 2  N  n  NPQ  N  n  PQ
population proportion P and V ( p )  V ( y )  S     .
nN  nN  N  1  N  1  n
ii) E ( Aˆ )  E ( Np )  N E ( p )  NP  A , means that Np is an unbiased estimate of NP and
 N n 2 2  N  n  NPQ  N  n  PQ
V ( Aˆ )  V (Yˆ )  N 2V ( y )  N 2  S  N    N2   .
 nN   nN  N  1  N 1  n
 N  n  pq  N  n  PQ
Theorem: Vˆ ( p )  v( p )    is an unbiased estimate of V ( p )    .
 n 1  N  N 1  n
 N n
Corollary: Vˆ ( Aˆ )  Vˆ ( Np )  N 2 Vˆ ( p )  N   pq is an unbiased estimate of
 n 1 
 N  n  PQ
V ( Aˆ )  N 2   .
 N 1  n
Example: For a population of size N  430 roughly we know that Y  19 , S 2  85.6
with srs , what should be the size of sample to estimate Yˆ with a margin of error 10% of Y
apart chance is 1 in 20.
Solution: Margin of error in the estimate y of Y is given, i.e.
19
y  Y  10% of Y or | y  Y |  10% of Y   1.9 , so that
10

1 Z 2 2 S 2 (1.96) 2  85.6
Pr [ | y  Y |  1.9]   0.05 , and n0    91.091678 .
20 d2 (1.9) 2
Therefore,
n0
n  75.168  75 .
n0
1
N
Example: In the population of 676 petition sheets. How large must the sample be if the total
number of signatures is to be estimated with a margin of error of 1000, apart from a 1 in 20
chance? Assume that the population mean square to be 229.
Solution: Let Y be the number of signature on all the sheets. Let Yˆ is the estimate of Y .
Margin of error is specified in the estimate Yˆ of Y as
1
| Yˆ  Y |  1000 , so that, Pr [ | Yˆ  Y |  1000]   0.05 .
20
We know that
2 2
 N Z 2 S 
   676  1.96  229  402.01385
n0
n , here, n0   
n  d   1000 
1 0
N
and hence
10

n  252.09  252 .
Estimation of sample size for proportion
a) When precision is specified in terms of margin of error: Suppose size of the
population is N and population proportion is P . Let a srs of size n is taken and p be
the corresponding sample proportion and d is the margin of error in the estimate p of P
. The margin of error can be specified in the form of probability statement as
Pr [ | p  P |  d ]   or Pr [ | p  P |  d ]  1   (1.6)
pP
As the population is normally distributed, so y ~ N [ P, V ( p )] , then Z  ~ N (0,1)
V ( p)
. For the given value of  we can find a value Z of the standard normal variate from
the standard normal table by the following relation:
| p  P | 
Pr   Z 2  or Pr [ | p  P |  V ( p ) Z  2 ]   (1.7)
 V ( p ) 
Comparing equation (1.6) and (1.7), the relation which gives the value of n with the
required precision of the estimate p of P is given by
 N  n  PQ
d  Z  2 V ( p) or d 2  Z 2 / 2 V ( p )  Z 2 2   , as sampling is
 N 1  n
srswr .

Z 2 2 PQ  N  n  N n Z 2 2 PQ PQ
 1    n0 , where n0   (1.8)
d2  n ( N  1)  n ( N  1) d2 V ( p)

N 1 N  n N N N 1
or   1   1
n0 n n n n0

N N n0 n0 n0
or n     (1.9)
N  1 n0  ( N  1) n0 N  1 n
1  1 0
n0 N N N

If N is sufficiently large, then n  n0

b) If precision is specified in terms of V ( p ) i.e. V ( p )  V (given).


PQ
Substituting V ( p )  V in relation (1.16) we get, n0  , and hence n can be obtained
V
by relation (1.17).
c) When precision is given in terms of coefficient of variation of p
Let

V ( p) V ( p)
CV ( p)  e    e2 , or V ( p)  e 2 P 2
P 2
P
(1.18)
Substitute equation (1.18) in relation (1.16), we get,
11

PQ Q 1 1 
n0      1 , and hence n is given by the relation (1.9).
e2 P2 e2 P e 2 P 

Example: In a population of 4000 people who were called for casting their votes, 50%
returned to the poll. Estimate the sample size to estimate this proportion so that the marginal
error is 5% with 95% confidence coefficient.
Solution: Margin of error in the estimate p of P is given by
| p  P |  0.05 , then Pr [ | p  P |  0.05]  0.05 .
We know that
Z 2 PQ (1.96) 2  0.5  0.5
n0   / 2   384.16  384
d2 0.0025
and hence,
n0
n  350.498  351 .
1  ( n0 / N )
Exercise: In a study of the possible use of sampling to cut down the work in taking
inventory in a stock room, a count is made of the value of the articles on each of 36 shelves
in the room. The values to the nearest dollar are as follows.
29, 38, 42, 44, 45, 47, 51, 53, 53, 54, 56, 56, 56, 58, 58, 59, 60, 60, 60, 60, 61, 61, 61, 62, 64,
65, 65, 67, 67, 68, 69, 71, 74, 77, 82, 85.
The estimate of total value made from a sample is to be correct within $200, apart from a 1 in
20 chance. An advisor suggests that a simple random sample of 12 shelves will meet the
requirements. Do you agree?  Yi  2138 , and  Yi2  131 682 .
Solution: It is given that

 Yi  2138 ,  Yi2  131 682 , and N  36 , then


i i

1   2138  
2
1  2
S2   iY 2
 NY   131 682  36     134.5
N 1  i  36  1   36  

and
1
| Yˆ  Y |  200 , then, Pr[| Yˆ  Y |  200]   0.05 .
20
We know that
2 2
n0  N Z / 2   36  1.96 
n , here n0    S   134.5  16.7409
n
1 0  d   200 
N
and therefore,
n  11.42765  12 .
Exercise: The selling price of a lot of standing timber is UW , where U is the price per unit
volume and W is the volume of timber on the lot. The number N of logs on the lot is
counted, and the average volume per log is estimated from a simple random sample of n
12

logs. The estimate is made and paid for by the seller and is provisionally accepted by the
buyer. Later, the buyer finds out the exact volume purchased, and the seller reimburses him if
he has paid for more than was delivered. If he has paid for less than was delivered, the buyer
does not mention the fact.
Construct the seller's loss function. Assuming that the cost of measuring n logs is cn , find
the optimum value of n . The standard deviation of the volume per log may be denoted by S
and the fpc ignored.

Solution: Let Ŵ be the estimated total volume of the timber. The error in the estimate
is Wˆ  W .
If Wˆ  W  z  0 sellers loss is zero, i.e. l ( z )  0 .

If Wˆ  W  z  0 sellers loss is  Uz , i.e. l ( z )  Uz .

When fpc is ignored V (Wˆ )  N 2 S 2 / n , then


 N 2S 2   NS
Wˆ ~ N W , , or z  (Wˆ  W ) ~ N  0,  , so that
 n   n 
 
 2
 
1  1  z   1  n z2 
f ( z)  exp    exp 
( NS n) 2  2  NS n   ( NS n) 2  2 N 2S 2 
   
Thus, the expected loss

 0 1  n z 2 
L(n)    l ( z ) f ( z ) dz    (Uz ) exp   dz
( NS n) 2  2 N 2S 2 
 

0 1  n z 2 
    Uz exp   dz
( NS n) 2  2 N 2S 2 
 

 1  n z 2 
 0 Uz exp   dz
( NS n) 2  2 N 2S 2 
 

n z2 2n z N 2S 2
Put  t , then dz  dt or z dz  dt .
2 N 2S 2 2 N 2S 2 n

Therefore,
2 2
 UN S 1 UNS  t UNS 
L(n)  0 e  t dt   e dt  , as 0 e  t dt  1 .
n ( NS n) 2 2 n 0 2 n
To determine the value of n , consider the function
UNS 1 / 2
 ( n)  L ( n)  C ( n)  c n  n .
2
Differentiate this function with respect to n , we get
 1  UNS   3 / 2 UNS
 0  c   n or n 3 / 2  c
n 2  2  2 2
13

2/3
3 / 2 2c 2  UNS 
or n  or n   
 .
UNS  2c 2 
Exercise: With certain populations, it is known that the observations Yi are all zero on a
portion QN of N units (0  Q  1) . Sometimes with varying expenditure of efforts, these
units can be found and listed, so that they need not be sampled. If  2 is the variance of Yi in
the original population and  02 is the variance when all zeros are excluded, then show that
2 Q
 02  Y 2 , where P  1  Q , and Y is the mean value of Yi for the whole

P 2
P
population.
Solution: Given Y1 , Y2 ,  , Y NP , Y NP 1 ,  , Y N (first NP units not zero, and rest NQ units
1 N 1 NP
which are all zero). Thus, Y   Yi , population mean, and YNP   Yi ,
N i 1 NP i 1
1 NQ N NP N NP
YNQ   iY  0 , also,  Yi   Yi , and  Yi2   Yi2 ,so that NY  NP Y NP ,
NQ i 1 i 1 i 1 i 1 i 1
1
or Y NP  Y . By definition,
P
1 N 1 N N
2   (Yi  Y ) 2   Yi2  Y 2 , or N 2   Yi2  NY 2 .
N i 1 N i 1 i 1
NP
Similarly, NP 02   Yi2  NP YNP
2
.
i 1
Thus,
1 1  Q
N ( 2  P 02 )  NP YNP
2
 NY 2  NP Y 2  NY 2  N   1 Y 2  N   Y 2 .
2 P  P
P
Therefore,
Q 2 Q
P o2   2    Y 2 or  o2   Y 2.
P P 2
P
Exercise: From a random sample of n units, a random sub-sample of n1 units is drawn
without replacement and added to the original sample. Show that the mean based on (n  n1 )
units is an unbiased estimator of the population mean, and that ratio of its variance to that of
1  3 n1 / n
the mean of the original n units is approximately , assuming that the population
(1  n1 / n) 2
size is large.
Solution:Let the sample mean based on n , n1 , and n  n1 elements are denoted by y n , yn1 ,
1 n 1 n1
and ynn1 respectively, and are defined as yn   yi , y n1   yi , and
n i 1 n1 i 1
14

n y n  n1 y n1
y n  n1  . We have to show E ( y n n1 )  Y , in this case the expectation is taken
n  n1
in two stages,
i) when n is fixed
ii) over all expectation
1 1
E ( y n n1 )  E (n y n  n1 y n1 )  E [ n y n  n1 E ( y n1 n)]
n  n1 n  n1
1
 E ( n y n  n1 y n ) , since n1 is a sub-sample of the sample of size n .
n  n1
1
 (n Y  n1 Y )  Y .
n  n1
To obtain the variance
2
 n y n  n1 y n1
2 
V ( y n n1 )  E ( y n n1  Y )  E  Y 
 n  n1 
 
1
 E [n y n  n1 y n1  ( n  n1 ) Y ] 2
2
(n  n1 )
1
 E [ n y n  n Y  n1 y n1  n1 Y ] 2
2
(n  n1 )

1
 E [n ( y n  Y )  n1 y n1  n1 y n  n1 y n  n1Y ] 2
2
( n  n1 )
1
 E [( n  n1 ) ( y n  Y )  n1 ( y n1  y n )] 2
2
( n  n1 )
1
 [( n  n1 ) 2 E ( y n  Y ) 2  n12 E ( y n1  y n ) 2 ] , as samples are
2
( n  n1 )
drawn independently.
1
 [(n  n1 ) 2 V ( y n )  n12 E{E ( y n1  y n ) 2 n}]
2
(n  n1 )

1   1 1  
  (n  n1 ) 2 V ( y n )  n12 E    S n2 
(n  n1 ) 2   n1 n  

1   n  n1  2 
  (n  n1 ) 2 V ( y n )  n12   S 
(n  n1 ) 2   1  
n n

1  n (n  n1 ) 2  n (n  n1 ) 2
  (n  n1 ) 2 V ( y n )  1 S   V ( yn )  1 S .
(n  n1 ) 2  n  n (n  n1 ) 2
15

Therefore,
V ( y n  n1 ) n1 ( n  n1 ) n1 (n  n1 )
 1 S 2  1 S2
V ( yn ) 2 2 2
n ( n  n1 ) V ( y n ) n ( n  n1 ) S / n

(n  n1 ) 2  n1 ( n  n1 ) n 2  n12  2 n1n  n1n  n12


 
(n  n1 ) 2 (n  n1 ) 2

n 2  3 n1n 1  (3 n1 / n)
  .
2
( n  n1 ) (1  n1 / n) 2

Exercise: A simple random sample of size n  n1  n2 with mean y is drawn from a finite
population, and a simple random subsample of size n1 is drawn from it with mean y1 . Show
that

i) V ( y1  y 2 )  S 2 [(1 / n1 )  (1 / n 2 )] , where y 2 is mean of the remaining n2 units in the


sample,

ii) V ( y1  y )  S 2 [(1 / n1 )  (1 / n)] ,


iii) Cov ( y , y1  y )  0 .
Repeated sampling implies repetition of the drawing of both the sample and subsample.
Solution:
i) In repeated sampling the given procedure is equivalent to draw subsamples of sizes n1
and n2 independently, thus
V ( y1  y 2 )  V ( y1 )  V ( y 2 ) , since Cov ( y1 , y 2 )  0

 S 2 [(1 / n1 )  (1 / n2 )] , ignoring fpc .


n y  n2 y 2 n y  n2 y 2
ii) y  1 1  y1  y  y1  1 1
n1  n2 n1  n2
n y  n2 y1  n1 y1  n2 y 2 n2 ( y1  y 2 )
or y1  y  1 1  .
n1  n2 n
Therefore,

 n ( y  y2 )  n22 n2  1 1  2
V ( y1  y )  V  2 1  V ( y1  y 2 )  2    S
 n  n2 n 2  n1 n2 

n 2  n  n2  2 n2 2 n  n1 2  1 1  2
 2  1  S  S  S     S .
n 2  n1 n2  n1 n n1 n  n1 n 
iii) Cov ( y , y1  y )  E [ y ( y1  y )]  E ( y ) E ( y1  y )

 E ( y y1  y 2 )  Y  0  E ( y y1 )  E ( y 2 ) (1)
Consider
16

 n y  n2 y 2  n n 
E ( y y1 )  E  1 1 y1   E  1 y12  2 y1 y 2 
 n   n n 
n n
 1 E ( y12 )  2 E ( y1 ) E ( y 2 )
n n

n n n S2  n
 1 [ V ( y1 )  Y 2 ]  2 Y 2  1  Y 2 2 Y 2
n n n  n1  n

S 2 n1 2 n2 2 S 2
  Y  Y  Y 2 (2)
n n n n
Now

S2
V ( y)  E ( y 2 )  Y 2 or E ( y 2 )  V ( y)  Y 2  Y 2 (3)
n
In view of equations (1), (2), and (3), we get
 S2   S2 
Cov ( y , y1  y )   Y 2  Y 2   0.
 n   n 
   
Exercise: A population has three units U1 ,U 2 and U 3 with variates Y1 ,Y2 and Y3
respectively. It is required to estimate the population total Y by selecting a sample of two
units. Let the sampling and estimation procedures be as follows:
Sample ( s ) P (s) Estimator t Estimator t 
(U1 , U 2 ) 1/ 2 Y1  2Y2 Y1  2Y2  Y12
(U1 , U 3 ) 1/ 2 Y1  2Y3 Y1  2Y3  Y12

Prove that both t and t  are unbiased for Y and find their variances. Comment on the
estimators.
Solution: By definition
1
E (t )   t i p (ti )  (Y1  2Y2  Y1  2Y3 )  Y .
i 2
This shows that estimator t is unbiased for Y .
1 1
E (t 2 )  [(Y1  2Y2 ) 2  (Y1  2Y3 ) 2 ]  (Y12  4Y22  4Y1Y2  Y12  4Y32  4Y1Y3 )
2 2
 Y12  2Y22  2Y32  2Y1Y2  2Y1Y3 .
Therefore,
V (t )  E (t 2 )  [ E (t )]2  Y12  2Y22  2Y32  2Y1Y2  2Y1Y3  (Y1  Y2  Y3 ) 2

 Y22  Y32  2Y2Y3  (Y2  Y3 ) 2 .


Similarly,
17

1
E (t )   ti p (ti )  (Y1  2Y2  Y12  Y1  2Y3  Y12 )  Y , hence, t  is unbiased for Y .
i 2
1
E (t  2 )  [(Y1  2Y2  Y12 ) 2  (Y1  2Y3  Y12 ) 2 ]
2
1 4
 (Y1  2Y13  Y12  4Y12Y2  4Y1Y2  4Y22  Y14  2Y13
2
 Y12  4Y12Y3  4Y1Y3  4Y32 )

 Y14  Y12  2Y12Y2  2Y1Y2  2Y22  2Y12Y3  2Y1Y3  2Y32 .


Therefore,
V (t )  E (t  2 )  [ E (t )] 2

 Y14  Y12  2Y12Y2  2Y1Y2  2Y22  2Y12Y3  2Y1Y3  2Y32  (Y1  Y2  Y3 ) 2

 (Y2  Y3 ) 2  Y12 (Y12  2Y2  2Y3 )

 V (t )  Y12 (Y12  2Y2  2Y3 ) .


We conclude that both linear estimator t and quadratic estimator t  are unbiased; among
which estimator has minimum variance depends on the variate values.
18

UNIT-II

SRATIFIED RANDOM SAMPLING

The precision of an estimator of the population parameters (mean or total etc.) depends on the
size of the sample and the variability or heterogeneity among the units of the population. If
the population is very heterogeneous and considerations of cost limit the size of the sample, it
may be found impossible to get a sufficiently precise estimate by taking a simple random
sample from the entire population. For this, one possible way to estimate the population mean
or total with greater precision is to divide the population in several groups (sub-population or
classes, these sub-populations are non-overlapping) each of which is more homogenous than
the entire population and draw a random sample of predetermined size from each one of the
groups. The groups, into which the population is divided, are called strata or each group is
called stratum and the whole procedure of dividing the population into the strata and then
drawing a random sample from each one of the strata is called stratified random sampling.
For example, to estimate the average income per household, it may be appropriate to group
the households into two or more groups (strata) according to the rent paid by the households.
The households in any stratum so form are likely to be more homogeneous with respect to
income as compared to the whole population. Thus, the estimated income per household
based on a stratified sample is likely to be more precise than that based on a simple random
sample of the same size drawn from the whole population.
Principal reasons for stratification
 To gain in precision, divide a heterogeneous population into strata in such a way that each
stratum is internally homogeneous.
 To accommodate administrative convenience (cost consideration), fieldwork is organized
by strata, which usually results in saving in cost and effort.
 To obtain separate estimates for strata.
 We can accommodate different sampling plan in different strata.
 We can have data of known precision for certain subdivisions treating each subdivision as
a population in its own right.
Notations
Let the population, consisting of N units is first divided into k strata (sub-populations) of
size N1 , N 2 ,  , N k . These sub-populations are non-overlapping such that
N1  N 2    N k  N . A sample is drawn (by the method of srs ) from each stratum
(group or sub-population) independently, the sample size within the i  th stratum being ni ,
(i  1, 2,  , k ) such that n1  n2    nk  n . The following symbols refer to stratum i .
N i , total number of units.
ni , number of units in sample.
ni
fi  , sampling fraction in the stratum.
Ni
Ni
Wi  , stratum weight.
N
19

y ij , value of the characteristic under study for the j  th unit in the i  th stratum,
j  1,2,  , N i .
N
1 i
Yi   yij , mean based on N i units (stratum mean).
N i j 1

n
1 i
yi   yij , mean based on ni units (sample mean).
ni j 1

N
1 i
 i2   ( yij  Yi ) 2 , variance based on N i units (stratum variance).
N i j 1

N
1 i
S i2  
N i  1 j 1
( y ij  Yi ) 2 , mean square based on N i units (stratum mean square).

n
1 i
si2  
ni  1 j 1
( y ij  y i ) 2 , sample mean square based on ni units.

k Ni k
Y    y ij   N i Yi , population total.
i 1 j 1 i 1

Y 1 k k
Y    N i Yi   Wi Yi , over all population mean.
N N i 1 i 1

Theorem: For stratified random sampling, wor , if in every stratum the sample estimate yi is
an unbiased of Yi , and samples are drawn independently in different strata, then
k
y st   Wi y i is an unbiased estimate of the over all population mean Y and its variance is
i 1
k   2 2
1 1
V ( y st )      Wi S i .
i 1  i 
n N i

Proof: Since sampling within each stratum is simple random sampling, i.e. E ( y i )  Yi , it
follows that
 k  k k
E ( y st )  E   Wi yi    Wi E ( yi )   Wi Yi  Y . To obtain the variance, we have
 
 i 1  i 1 i 1

2 2
k  k  k 
V ( y st )  E [ y st  E ( y st )]  E  Wi y i  E   Wi y i   E  Wi { y i  E ( y i )}
2
 i 1    i 1 
 i 1 
20

 
k 2   k 
 E  Wi { y i  E ( yi )}2   E   Wi Wi { y i  E ( yi )} { y i  E ( yi )}
 i 1  i, i 
i i 
k k k
  Wi2 V ( y i )    Wi Wi Cov ( yi , yi ) .
i 1 i 1 ii 1
Since samples are drawn independently in different strata, all covariance terms vanishes, then
k k   2 2
1 1
V ( y st )   Wi2 V ( y i )      Wi S i , as srswor within each stratum.
i 1  i 
i 1
n Ni

Alternative expressions of V ( y st )
k 
1 1  2 2 k  N i  ni  N i2 2 1 k
i) V ( y st )      Wi S i   
2  i
 S i / ni  N ( N i  ni ) S i2 / ni .
2
i 1  i  i 1  N
n Ni Ni N i 1

1 k
1 k 2  n  k (1  f i ) S i2
ii) V ( y st )   i i i i i 2  Ni
N ( N  n ) S 2
/ n  1  i  S i2 / ni   Wi2 .
N 2 i 1 N i 1  Ni  i 1
n i

Corollary: Yˆst  N y st is an unbiased estimate of the population total Y with its variance
k 
1 1  2 2
V (Yˆst )      N i Si .
i 1  i
n N i 

Proof: By definition
E (Yˆst )  N E ( y st )  NY  Y , and
k   2 2 k 1  2 2
1 1 1
V (Yˆst )  N 2 V ( y st )  N 2     Wi S i      N i S i
i 1  i  i 1  i 
n Ni n Ni
k k
  N i ( N i  ni ) S i2 / ni   N i2 (1  f i ) S i2 / ni .
i 1 i 1
Remarks
n
a) If N i are large as compared to ni (if the sampling fractions f i  i are negligible in all
Ni
strata), then,
k
1 k 2 2
i) V ( y st )   Wi2 S i2 / ni   N i S i / ni .
i 1 N 2 i 1
k
ii) V (Yˆst )   N i2 S i2 / ni .
i 1
21

n N N
b) If in every stratum i  i i.e. ni  n i  nWi , the variance of y st reduces to
n N N
k N n  2 2 k  N  nW  1 f k
V ( y st )    i i  Wi S i / ni    i i  Wi S i2 / n   Wi Si2 .
i 1  i 1 
N i N i n i 1

n N
c) If in every stratum i  i , and the variance of y st in all strata have the same value S 2
n N
1 f k 2 1 f 2
k
, then the result reduces to V ( y st )   i
n i 1
W S 
n
S , since  Wi  1.
i 1
Estimation of variance

If a simple random sample is taken within each stratum, then an unbiased estimator of S i2 , is
n
1 i
si2   ( yij  yi ) 2 , and an unbiased estimator of variance y st is
ni  1 j 1

k   2 2
1 1 1 k
Vˆ ( y st )  v ( y st )    
2  i
 Wi si  N ( N i  ni ) si2 / ni
i 1 i 
n Ni N i 1
k
  Wi2 (1  f i ) si2 / ni .
i 1
Alternative form for computing purposes
k W 2s2 k W 2s2 k W 2s2 k W s2
V ( y st )   i i
 i i
 i i
 i i
.
i 1
ni i 1
Ni i 1
ni i 1
N
k
Theorem: If stratified random sampling is with replacement, then y st   Wi y i is an
i 1
k
unbiased estimate of population mean Y and its variance is V ( y st )   Wi2 S i2 / ni .
i 1

Proof: As in stratified random sampling, wor , E ( y st )  Y , and


k k k  N 1 2 k
V ( y st )   Wi2 V ( yi )   Wi2 i2 / ni   Wi2  i  S i / ni   Wi2 S i2 / ni
i 1 i 1 i 1  Ni  i 1
k k
Corollary: Yˆst  N y st  N  Wi yi   N i yi is an unbiased estimate of the population
i 1 i 1
total Y and its variance is
k k
V (Yˆst )  V ( N y st )  N 2 V ( y st )  N 2  Wi2 S i2 / ni   N i2 S i2 / ni .
i 1 i 1
22

Choice of sample size in different strata


There are three methods of allocation of sample sizes to different strata in a stratified
sampling procedure. These are
i) Equal allocation.
ii) Proportional allocation.
iii) Optimum allocation.
Equal allocation: In this method, the total sample size n is divided equally among all
the strata, i.e. for i  th stratum ni  n / k . In practice, this method is not used except when
the strata sizes are almost equal.
Proportional allocation: This procedure of allocation is very common in practice
because of its simplicity. When no other information except N i , the total number of units in
the i  th stratum, is available, the allocation of a given sample of size n to different strata is
done in proportion to their sizes, i.e. in the i  th stratum ni  N i or ni   N i , where  is
the constant of proportionality, and
k k
n n
 ni    N i , or 
N
,  ni 
N
N i  nWi .
i 1 i 1

V ( y st ) Under proportional allocation


k   2 2 k  W 
1 1 W
V ( y st ) prop      Wi S i    i  i  Wi S i2
i 1  i 1 
nWi N i nWi N i
k
1 1  1 f k
     Wi S i2   Wi S i2 .
 n N  i 1 n i 1

Note: If the variances in all strata have the same value, S 2 (say), then

1 f 2 k
V ( y st ) prop  S , as  Wi  1 .
n i 1

Alternative expressions of V ( y st ) prop

k k N
1 1   N n N n k
V ( y st ) prop      Wi S i2    2  i i
i
S i2  N S2 .
 n N  i 1  nN  i 1 N nN i 1
Optimum allocation: In this method of allocation the sample sizes ni in the respective
strata are determined with a view to minimize V ( y st ) for a specified cost of conducting the
sample survey or to minimize the cost for a specified value of V ( y st ) . The simplest cost
function is of the form
k
Cost  C  c0   ci ni , where the overhead cost c0 is constant and ci is the average
i 1
cost of surveying one unit in the i  th stratum
23

k
C  c 0   ni ci  C  (say) (2.1)
i 1
k 
1 1  2 2 k Wi2 S i2 k W 2S 2
and V ( y st )      Wi S i     i i , so that
i 1  i 
n Ni i 1
ni i 1
Ni

k W 2S 2 k W 2S 2
V ( y st )   i i  i i  V  (say) (2.2)
i 1
Ni i 1
ni

where C  and V  are function of ni . Choosing the ni to minimize V for fixed C or C for
fixed V are both equivalent to minimizing the product
 k W 2S 2  k 
V C   i i  n c 

  i i 
 i 1 ni   i 1 
It may be minimized by use of the Cauchy-Schwartz inequality, i.e. if ai , bi , i  1,2,  , k are
two sets of k positive numbers, then
2
 k 2  k 2   k 
  ai    bi     ai bi  , equality holds if and only if bi is constant for all i .
     ai
 i 1   i 1   i 1 
Taking ai  Wi S i / ni  0 , and bi  ni ci  0 , then
2
k k  k 
V  C    (Wi S i / ni )  ( ni ci )    Wi S i ci  .
2 2
 
i 1 i 1  i 1 
2
 k 
Thus, no choice of ni can make V  C  smaller than   Wi S i ci  . This minimum value
 
 i 1 
b
occurs when i  constant, say  .
ai

bi  ni  ni ci WS
  ni ci    or ni   i i (2.3)
ai  Wi S i  Wi S i ci
 
 ni  Wi S i / ci , this allocation is known as optimum allocation.

Taking summation on both the sides of equation (2.3), we get


k k WS
n
 i  ic i
n   or 
k
, and hence,
i 1 i 1 i
Wi Si / ci
i 1

Wi S i / ci N i S i / ci
ni  n n . (2.4)
k k
Wi S i / ci  N i Si / ci
i 1 i 1
24

Alternative method
To determine ni such that V ( y st ) is minimum and cost C is fixed, consider the function
k Wi2 S i2 k Wi2 S i2  k 
      c0   ci ni  C  , where  is some unknown
ni Ni  
i 1 i 1  i 1 
constant.
Using the calculus method of Lagrange multipliers, we select ni , and the constant  to
minimize  . Differentiating  with respect to ni , and equating to zero, we have

 W 2S 2 1 Wi S i
 0   i i   ci or ni 
 ni ni2  ci
(23a)
 ni  Wi S i / ci or ni  N i S i / c i .

Taking summation on both the sides of equation (2.3a), we get


k
1 k 1 n
 ni  
 Wi S i / ci or


k
i 1 i 1
 Wi S i / ci
i 1

Wi S i / ci N i S i / ci
 ni  n n (2.4a)
k k
 Wi S i / ci  N i Si / ci
i 1 i 1

The total sample size n required for the optimum sample sizes within strata. The solution for
the value of n depends on whether the sample is chosen to meet a specified total cost C or to
give a specified variance V for y st .
i) If cost is fixed, substitute the optimum values of ni in (cost function) equation (2.1) and
solve for n as
k k Wi S i / ci k Wi S i ci
C  c0   ci ni   n ci  n 
k k
i 1 i 1
Wi S i / ci i 1
Wi Si / ci
i 1 i 1

C  c0 k
 n
k Wi Si / ci .
Wi S i ci i 1
i 1

Hence,
C  c0 k Wi S i / ci (C  c0 ) Wi S i / ci
ni 
k  Wi Si / ci 
k

k
.
Wi S i ci i 1 Wi S i / ci Wi S i ci
i 1 i 1 i 1

V ( y st ) Under optimum allocation for fixed Cost


25

k   k  
1  ci  Wi S i ci   1  Wi2 S i2
V ( y st ) opt   
i 1 
(C  c0 ) Wi S i   Ni 

i 1 
k   k  2 2
1   Wi S i ci  Wi S i ci  Wi S i 
 
C  c0  
i 1   i 1  Ni 

2
1  k  k
Wi S i2
 Wi Si ci    N .
C  c0  i 1

 i 1

ii) If V is fixed, substitute the optimum ni in equation (2.2), we get

k
Wi2 S i2  Wi S i / ci
1 k 1 k 1 k  k 
V ( y st )   i i n
N i 1
W S 2
 i 1
Wi S i / ci
  Wi S i
n i 1
ci   Wi S i / ci  .
 
i 1  i 1 
Thus,

1 k  k 
n  Wi S i c i
  Wi S i / ci  , and hence,
 
1 k  i 1 
V   Wi S i2 i 1
N i 1

1  k 
ni   (Wi S i / ci )  Wi S i ci  .
1 k  
V   Wi S i2  i 1
N i 1

Optimum cost for fixed variance

 k   k  k
 (Wi S i / ci )  Wi S i ci    Wi S i ci   Wi S i ci
k  
   i 1  i 1
C  c 0   ci  i 1

k k
i 1  1  1

V Wi S i
N i 1
2

V   Wi S i2
N i 1
 
2
1  k 
   Wi S i ci  .
1 k  
V   Wi S i2  i 1 
N i 1

Remark
An important special case arises if ci  c , that is, if the cost per unit is the same in all strata.
k
The cost becomes C  c0   c ni  c0  cn , and optimum allocation for fixed cost reduces
i 1
to optimum allocation for fixed sample size. The result in this case is as follows:
26

In stratified random sampling V ( y st ) is minimized for a fixed total size of sample n if

Wi S i N i Si
ni  n n  ni  Wi S i or ni  N i S i , and is called Neyman
k k
Wi Si  N i Si
i 1 i 1
allocation and V ( y st ) under optimum allocation for fixed n or Neyman allocation.

k  k  2 2 k 1  k  
V ( y st ) opt   
1
 Wi S i 
1  Wi S i      Wi S i  Wi S i  1  N i  Wi S i2 
 n Wi S i  n 
i 1   i 1
i 1  i 1
Ni   Ni  N  

2
1 k  1 k
   Wi S i    Wi S i2 .
n  i 1 
 N i 1

2
1 k 
Note: If N is large, V ( y st ) opt reduces to V ( y st ) opt    Wi S i  .
n  i 1 

Relative precision of stratified with simple random sampling
Here, we shall make a comparative study of the usual estimators under simple random
sampling, without stratification and stratified random sampling employing various schemes
of allocation i.e. proportional and optimum allocations. This comparison shows how the gain
due to stratification is achieved.
Consider the variances of these estimators of population mean, which are as follows.
1 f 2
Vran  S
n

1 f k 1 k 1 k
V prop  
n i 1
Wi S i   Wi S i   Wi S i2 .
2
n i 1
2
N i 1

2
1 k  1 k
Vopt    Wi S i    Wi S i2 .
n  i 1 
 N i 1

Now
k Ni k Ni
( N  1) S 2    ( yij  Y ) 2    ( yij  Yi  Yi  Y ) 2
i 1 j 1 i 1 j 1

k Ni k Ni k Ni
   (Yij  Yi ) 2    (Yi  Y ) 2  2  ( yij  Yi ) (Yi  Y )
i 1 j 1 i 1 j 1 i 1 j 1

k k k N i 
 ( N i  1) S i2   N i (Yi  Y )  2 (Yi  Y )   (Yij  Yi )
2
i 1 i 1 i 1  j 1 
27

k k
  ( N i  1) S i2   N i (Yi  Y ) 2 , as sum of the deviations from their
i 1 i 1
mean is zero.
k N 1
  2 k Ni
or S 2    i  Si   (Yi  Y ) 2
i 1 
N  1  i 1
N  1

For large N ,
1 N i  1 ( N i / N )  (1 / N )
 0 , so that,   Wi
N N 1 1  (1 / N )
and
Ni (Ni / N )
  Wi ,
N  1 1  (1 / N )
so that
k k
S 2   Wi S i2   Wi (Yi  Y ) 2 .
i 1 i 1

Hence,
1 f 2 1 f k 2 1 f
k
Vran 
n
S   i i
n i 1
W S   Wi (Yi  Y ) 2
n i 1

1 f k
 V prop   Wi (Yi  Y ) 2  V prop  positive quantity.
n i 1

Thus, Vran  V prop . (2.5)

Further, consider
2
1 k 1 k 1 k  1 k
V prop  Vopt   Wi S i2   Wi S i2    Wi S i    Wi S i2
n i 1 N i 1 n  i 1 
 N i 1

  k   1 k
2
 k 
2
 k  
2
1 k
  Wi S i    Wi Si    n  Wi Si    Wi S i   2   Wi S i  
n i 1
2     2    
  i 1   i 1  i 1   i 1  

  k
2
 k  k  k 
1 k
  Wi S i    Wi S i   Wi  2   Wi Si    Wi S i  ,
n i 1
2     
  i 1  i 1  i 1   i 1 
k
as  Wi  1
i 1
28

  2

1 k  2  k   k
  Wi S i    Wi Si   2 Si   Wi Si 
n i 1 
 
   i 1   i 1 

2
1 k  k 
  Wi  S i   Wi S i    ve quantity.
n i 1  i 1


2
1 k  k 
 V prop  Vopt   Wi  S i   Wi S i  .
n i 1  i 1


Thus,
V prop  Vopt .
(2.6)
From equation (2.5) and (2.6), we get
Vran  V prop  Vopt .

Also,
2
1 k  k  1 f k
Vran  Vopt   Wi S i   Wi S i 
  Wi (Yi  Y ) 2 .
n i 1  i 1

 n i 1

Remark
In comparing the precision of stratified with un-stratified random sampling, it was assumed
that the population values of stratum means and variances were known.
Estimation of the gain in precision due to stratification
It is sometimes of interest to examine, from a survey, whether the mode of stratification has
been effective in estimating the population mean with increased gain in precision relative to
simple random sampling without replacement. The data available from the sample are the
value N i , ni , yi , and si2 . An unbiased estimator of the variance of y st is given by
k k
Vˆ ( y st )   Wi2 si2 / ni   Wi si2 / N .
i 1 i 1

The problem is to compare this variance with an unbiased estimate of V ( y sr ) based on the
given stratified sample. For estimation of V ( y sr ) , note that

1 1  N n 2
V ( y sr )     S 2  S .
n N  nN

We shall first estimate S 2 , when yi and si2 are available for all the strata. Consider, the
relation
k k k k
( N  1) S 2   ( N i  1) S i2   N i (Yi  Y ) 2   ( N i  1) S i2  N  Wi (Yi  Y ) 2 .
i 1 i 1 i 1 i 1
29

k  k 
  ( N i  1) S i2  N   Wi Yi 2  Y 2  .
 
i 1  i 1 

To get the estimate of S 2 , we need the estimates of S i2 , Yi 2 and Y 2 . As sampling is simple


random wor within each stratum, so si2 is unbiased estimate of S i2 . Note that

1 1  2
V ( y i )  E ( yi  Yi ) 2 ,  Yi 2  E ( y i2 )  V ( y i ) , and Yˆi 2  yi2     si
 ni N i 

Similarly, after noting that V ( y st )  E ( y st  Y ) 2 


k   2 2
1 1
Yˆ 2  y st2      Wi si .
i 1  i 
n Ni

Thus,
k k  1 1  2   2 k 
1 1  2 2 
( N  1) Sˆ 2   ( N i  1) si2  N  Wi  yi2     si    y st      Wi si 
i 1 i 1   ni N i    i 1  i
n Ni  

k k k 
1 1  k 
1 1  2 2
  ( N i  1) si2  N  Wi yi2  y st2      Wi si2      Wi si 
i 1  i 1 i 1  i
n Ni  i 1  i
n Ni  
k k k 1 1  2
  ( N i  1) si2  N  Wi ( yi  y st ) 2   Wi (1  Wi )    si 
i 1  i 1 i 1  ni N i  

1 k k k 1 1  2
 N   ( N i  1) si2   Wi ( y i  y st ) 2   Wi (1  Wi )    si  .
 N i 1 i 1 i 1  ni N i  
Therefore,

N  n  N  1 k 1 k 2 k
Vˆ ( y sr )    i i N  si   Wi ( yi  y st ) 2
n N  N  1  N i 1
N s 2

 i 1 i 1

k k k 
  Wi (1  Wi ) si2 / ni   Wi si2 / N i   Wi2 si2 / N i  .

i 1 i 1 i 1 
Put N i  N Wi

N  n  1 k 1 k 2 k k
Vˆ ( y sr )   i i N  i  i i st  Wi (1  Wi ) si2 / ni
n ( N  1)  N i 1
N W s 2
 s  W ( y  y ) 2

i 1 i 1 i 1
k k 
  Wi si2 / N Wi   Wi2 si2 / N Wi 

i 1 i 1 
30

N  n  k k k
1 k 
 
n ( N  1)  i 1
Wi s 2
i   Wi ( y i  y st ) 2
  Wi (1  Wi ) s 2
i / ni  
N i 1
Wi si2 

i 1 i 1 

N n  1 k N  n  k k 
1    Wi si   i i st  i 
2 2 2
 W ( y  y )  W (1  W ) s / n
n ( N  1)  i 1
i i i
n ( N  1)  N  i 1 i 1 

N n k N  n  k k 
 
n N i 1
W s
i i
2
 
n ( N  1)  i 1
Wi ( y i  y st ) 2
  Wi (1  Wi ) si2 / ni  .

i 1 
The estimate of the relative gain in precision due to stratification is thus obtained by
Vˆ ( y sr )  Vˆ ( y st )
.
Vˆ ( y )st
Alternative result

N  n  1 k 1 k 2 k k
Vˆ ( y sr )   i i N  i  i i st Wi (1  Wi ) si2 / ni
n ( N  1)  N i 1
N W s 2
 s  W ( y  y ) 2

i 1 i 1 i 1

k k 
  Wi si2 / N Wi   Wi2 si2 / N Wi 

i 1 i 1 

N  n  k k k k
1 k 
 
n ( N  1)  i 1
Wi si2   Wi ( yi  y st ) 2   Wi si2 / ni   Wi2 si2 / ni   Wi si2 
N i 1 
i 1 i 1 i 1 

N n  k k
2 1 W 1 
  Wi ( yi  y st )   Wi si 1   i   .
2

n ( N  1) i 1 i 1  ni ni N 
Exercise: In a population with N  6 and k  2 the values of yij are 0, 1, 2 in stratum
1 and 4, 6, 11 in stratum 2. A sample with n  4 is to be taken.
i) Show that the optimum ni under Neyman allocation are n1  1 and n2  3 .
ii) Compute the estimate y st for every possible sample under optimum allocation and
proportion allocation. Show that the estimates are unbiased. Hence find V ( y st ) directly
under optimum and proportion allocation and verify that V ( y st ) under optimum agrees
2
1 k  1 k k 
1 1  2 2
with the formula V ( y st )    Wi S i    Wi S i2      Wi S i and
n  i 1 
 N i 1 i 1 i
n Ni 
k
1 1 
V ( y st ) under proportion agrees with the formula V ( y st )      Wi S i2 .
 n N  i 1

Solution: Given N  6 , n  4 , k  2 , and N1  N 2  3 , also Y1  1 , and Y2  7 .


i) Under Neyman allocation,
31

N
N i Si 1 i
ni  n
k
, where S i2   ( yij  Yi ) 2 , so that,
N i  1 j 1
 N i Si
i 1
3 3
1 1
S12  
N1  1 j 1
( y1 j  Y1 ) 2  1 , and S 22  
N 2  1 j 1
( y 2 j  Y2 ) 2  13 .

Therefore,
N1 S1 N S
n1  n  1 , and n2  n 21 2  3 .
 N i Si  N i Si
i i

ii) Possible samples under optimum allocation will be 3C1  3 C 3  3 , since n1  1 , n2  3


and N1  3 , N 2  3

Samples Means
I II y1 y2 y st
0 (4, 6, 11) 0 7 3.5
1 (4, 6, 11) 1 7 4.0
2 (4, 6, 11) 2 7 4.5

E ( y st )  (3.5  4.0  4.5) / 3  4  Y , thus y st is unbiased estimate of Y under


optimum allocation.

V ( y st )  [(3.5  4) 2  (4.0  4) 2  ( 4.5  4) 2 ] / 3  0.1667 .


k   2 2  1
1 1 1  2 2  1 1  2 2
V ( y st )      Wi S i     W1 S1     W2 S 2  0.1667 .
i 1  i
n Ni   n1 N1   n2 N 2 
2
1 k  1 k
V ( y st )    Wi S i    Wi S i2  0.1667 .
n  i 1 
 N i 1

Possible samples under proportional allocation will be 3C 2 3 C 2  9 , since ni  nWi , so


that n1  2 , n2  2 .

Samples Means
I II y1 y2 y st
(0, 1) (4, 6) 0.5 5.0 2.75
(0, 1) (4, 11) 0.5 7.5 4.00
(0, 1) (6, 11) 0.5 8.5 4.50
(0, 2) (4, 6) 1.0 5.0 3.00
(0, 2) (4, 11) 1.0 7.5 4.25
(0, 2) (6, 11) 1.0 8.5 4.75
32

(1, 2) (4, 6) 1.5 5.0 3.25


(1, 2) (4, 11) 1.5 7.5 4.50
(1, 2) (6, 11) 1.5 8.5 5.00

1
E ( y st )  (2.75  4.00  4.50  3.00  4.25  4.75  3.25  4.50  5.00)  4  Y
9
Therefore, y st is unbiased estimate of Y under proportion allocation.
1
V ( y st )  [(2.75  4) 2  (4.00  4) 2    (5.00  4) 2 ]  0.583 .
9
By formula
k
1 1 
V ( y st )      Wi S i2  0.583 .
 n N  i 1
Exercise:The households in a town are to be sampled in order to estimate the average amount
of assets per household. The households are stratified into a high-rent and low-rent stratum. A
house in the high-rent stratum is thought to have about nine times as much assets as one in
the low-rent stratum, and Si is expected to be proportional to the square root of the stratum
mean. There are 4000 households in the high-rent stratum and 20, 000 in the low-rent
stratum.
i) Distribute a sample of 1000 households between the two strata.
ii) If the object is to estimate the difference between assets per household in the two strata,
obtain the optimum sample sizes to be distributed in two strata such that n1  n2  1000 .
Solution:
1 5
Given N1  4000 , N 2  20, 000 , W1  , and W2  .
6 6
Also,
Y1  9Y2 , S1  Y1 ,  S1  A Y1

and S 2  Y2 ,  S 2  A Y2 .
i) Since total sample size is fixed i.e. n  1000 , then the optimum value (under Neyman
Wi S i
allocation) ni  n , so that
k
 Wi S i
i 1

W1 S1 1 / 6 (3 A Y2 )
n1  n  1000   375 , and n2  625 .
W1 S1  W2 S 2 1 / 6 (3 A Y2 )  5 / 6 ( A Y2 )

ii) Unbiased estimate of (Y1  Y2 ) is ( y1  y 2 ) , therefore,


V ( y1  y 2 )  V ( y1 )  V ( y 2 )  0 , as sampling from strata are independent.
33

 2  S1 S 22 
2
 1 1  2  1 1
    S1     S 2    terms independent of n1 and n2 .
 n1 N1   n2 N 2   n1 n 2 
 
Now our problem is to find n1 and n2 such that variance of the estimate is minimum
subject to condition n1  n2  1000 .
To determine the optimum value of ni , consider the function

S12 S 22
    ( n1  n 2  1000) . (1)
n1 n2
where  is some unknown constant. Using the calculus method of Lagrange multipliers,
we select ni and the constant  to minimize  .

Differentiating equation (1) with respect to ni , we have

 S2 S2
0 1    1 (2)
 n1 n12 n12

 S 22 S2
0     2 (3)
 n2 n22 n22

In view of equations (2) and (3), we get

S12 S2 S12 n2 S1 n1
 2   1 and  .
n12 n22 S 22 n22 S 2 n2

But from given values, we have

S1 3 A Y2
 3  S1  3 S 2 , and hence,
S2 A Y2
n1 3 S 2
  3  n1  3n2 .
n2 S2
Therefore,
3 n2  n2  1000  n2  250 and n1  750 .
Exercise: A sampler has two strata with relative sizes W1 , W2 . He believes that S1 , S 2 can
be taken as equal but thinks that c2 may be between 2c1 and 4c1 . He would prefer to use
proportional allocation but does not wish to incur a substantial increase in variance compared
with optimum allocation. For a given cost C  c1n1  c2 n2 , ignoring the fpc , show that
V ( y st ) prop W1c1  W2 c 2
 .
V ( y st ) opt (W1 c1  W2 c 2 ) 2

If W1  W2 , compute the relative increases in variance from using proportional allocation


when c2 / c1  2, 4 .
34

Solution: We know that V ( y st ) under proportional allocation, ignoring fpc is

1 k 1 1
V ( y st ) prop  
n i 1
Wi S i2  (W1S12  W2 S 22 )  S 2 , as S1  S 2  S (say), and
n n

W1  W2  1 .
Under proportional allocation
n1  nW1 , and n2  nW2 , then C  nW1c1  nW2 c2  n (W1c1  W2 c2 ) . So that
C 1
n , and V ( y st ) prop  (W1c1  W2 c 2 ) S 2 .
W1c1  W2 c 2 C

Note that, V ( y st ) under optimum allocation, ignoring fpc is


2
1 k  1 1
V ( y st ) opt    Wi S i ci   (W1 S1 c1  W2 S 2 c 2 ) 2  (W1 c1  W2 c 2 ) 2 S 2 .

C  i 1  C C

Therefore,
1
V ( y st ) prop (W1c1  W2 c2 ) S 2
C W1c1  W2 c2
  .
V ( y st ) opt 1 (W1 c1  W2 c2 ) 2
(W1 c1  W2 c2 ) 2 S 2
C
The relative increase in variance from using proportional allocation is given by
V ( y st ) prop  V ( y st ) opt V ( y st ) prop W1c1  W2 c 2
RI   1  1.
V ( y st ) opt V ( y st ) opt (W1 c1  W2 c 2 ) 2

If W1  W2 , we have W1  W2  0.5 , since W1  W2  1 . Thus


0.5 c1  0.5 c 2 c1  c 2
RI  1  1.
(0.5 c1  0.5 c2 ) 2 0.5 ( c1  c 2 ) 2

c
i) When 2  2 or c 2  2c1 , then
c1

c1  2c1 3c1
RI  1  1  0.029437 .
2
0.5 ( c1  2c1 ) 0.5 c1 (1  2 ) 2

c
ii) When 2  4 or c 2  4c1 , then
c1
c1  4c1 5c1
RI  1  1  0.11111 .
2
0.5 ( c1  2 c1 ) 0.5 c1 (1  2) 2
Exercise: A sampler proposes to take a stratified random sample. He expects that his field
costs will be of the form  ci ni . His advance estimates of relevant quantities for two strata
are as follows:
35

Stratum Wi Si ci
1 0.4 10 4
2 0.6 20 9

n n
i) Find the values of 1 and 2 that minimize the total cost for a given value of V ( y st ) .
n n
ii) Find the sample size required, under this optimum allocation, to make V ( y st )  1 , if fpc
is ignored.
iii) Obtain the total fixed cost.
Solution:
i) The optimum value of ni for given variance when cost is minimum are given by

Wi S i / ci ni Wi S i / ci
ni  n   , then
k n k
 Wi S i / ci  Wi S i / ci
i 1 i 1

n1 W1 S1 / c1 1
 
n W1 S1 / c1  W2 S 2 / c 2 3

and
n2 W2 S 2 / c 2 2
  .
n W1 S1 / c1  W2 S 2 / c 2 3

ii) We know that


k   2 2
1 1
V ( y st )      Wi S i , if fpc is ignored, then V ( y st ) reduces to
i 1  i 
n Ni
2 2 2 2
k Wi2 S i2 W1 S1 W2 S 2 0.16  100  3 0.36  400  3 264
V ( y st )        .
i 1
ni n1 n2 n 2n n

It is given that V ( y st )  1 , so that, n  264 .


Therefore,
n
n1   88 , and n2  176 .
3
Or
We know that the optimum value of ni for given variance are

(Wi S i / ci )  Wi S i ci
ni  i .
1
V   Wi S i2
N i

For large N , and V ( y st )  1 , it reduces to


36

ni  (Wi S i / ci )  Wi S i ci .
i

Therefore, after simplification, n1  88 , and n2  176 .


iii) Cost function is given as  ci ni , then,  ci ni  c1 n1  c2 n2  1936 .
Exercise: After the sample in previous exercise is taken, the sampler finds that his field
costs were actually $2 per unit in stratum 1 and $12 in stratum 2.
i) How much greater is the field cost than anticipated?
ii) If he had known the correct field costs in advance, could he have attained V ( y st )  1 for
the original estimated field cost in previous exercise?
Solution:
i) The correct field cost  c1n1  c2 n2  2  88  12  176  2288 .
ii) By Cauchy-Schwartz inequality
2 k W 2S 2
 k  k
V  C     Wi S i ci  , where V    i i , and C    ni ci
  ni
 i 1  i 1 i 1

Thus, to get V   1 , the minimum cost will be

C   (W1 S1 c1  W2 S 2 c 2 ) 2  (0.4  10 2  0.6  20 12 ) 2  2230 .


Or
Note that, optimum cost for fix variance, ignoring fpc , is
2 2
1 k   k 
C    Wi S i ci     Wi S i ci   2230 , as V ( y st )  1 .
V  i 1 


 i 1


Exercise: With two strata, a sampler would like to have n1  n2 (equal allocation) for
administrative convenience, instead of using the values given by Neyman allocation. If
V ( y st ) , and V ( y st ) opt denotes the variances of equal allocation and the Neyman allocation,
V ( y st )  V ( y st ) opt 2
 r 1
respectively, show that the fractional increase in variance   ,
V ( y st ) opt  r  1
n n 
where r  1 as given by Neyman allocation i.e. r   1  .
n2  n2  opt
k 
1 1  2 2
Solution: We know that V ( y st )      Wi S i , then variance of equal
i 1  n i N i 
allocation (n1  n2  n ) , if fpc is ignored, for two strata reduces as

W 2 S2 W 2 S2 1
V ( y st )  1 1  2 2  (W12 S12  W22 S 22 )
n n n
and variance under Neyman allocation (for fixed n ), is
37

2
1 k  1
V ( y st ) opt    Wi S i   (W1 S1  W2 S 2 ) 2 .
n  i 1 
 2 n 

For fixed n optimum allocation reduces Neyman allocation, so that


Wi S i W1 S1 W2 S 2
ni  2 n  , so n1  2 n  , and n2  2 n  , then,
k k k
 Wi S i  Wi S i Wi S i
i 1 i 1 i 1

 n1  W S
   1 1  r (given).
 n2  opt W2 S 2
Therefore,
1 1
(W12 S12  W22 S 22 )  (W1 S1  W2 S 2 ) 2
V ( y st )  V ( y st ) opt n  2n 

V ( y st ) opt 1
(W1 S1  W2 S 2 ) 2
2 n

2 (W12 S12  W22 S 22 )  (W1 S1  W2 S 2 ) 2



(W1 S1  W2 S 2 ) 2

2 W12 S12  2 W22 S 22  W12 S12  W22 S 22  2 W1 W2 S1S 2



(W1 S1  W2 S 2 ) 2

2
 W1 S1 
  1 2
(W1 S1  W2 S 2 ) 2 2
  (r  1)   r  1  .
 2 2
W S
  
(W1 S1  W2 S 2 ) 2  W1 S1 
2
(r  1) 2  r  1 
  1
 W2 S 2 
k
Exercise: If the cost function is the form C  c0   ci ni , where c 0 and ci are known
i 1
numbers, then
i) Show that in order to minimize V ( y st ) for fixed total cost, ni must be proportional to
2/3
 Wi2 S i2 
  .
 ci 
 
ii) Find the ni for a sample of size 1000 under the following conditions:

Stratum Wi Si ci
1 0.4 4 1
2 0.3 5 2
3 0.2 6 4
38

Solution:
k W 2 S2 k W 2 S2
i) We have V ( y st )   i i   i i
i 1
ni i 1
Ni
k
To determine ni such that V ( y st ) is minimum, and cost C  c0   ci ni is fixed
i 1
(given), we consider the function
k Wi2 S i2 k Wi2 S i2  k 
      c0   ci ni  C  .
ni Ni  
i 1 i 1  i 1 
(1)
where  is some unknown constant. Using the calculus method of Lagrange multipliers,
we select ni and the constant  to minimize  .
Differentiating equation (1) with respect to ni , we have

 W 2 S2 1 Wi2 S i2 1
 0   i i   ci (ni ) 1 / 2    ci (ni ) 1 / 2
 ni 2
ni 2 2
ni 2

2/3  2/3
2 Wi2 S i2 2
2 2 
or (ni ) 32
 or ni     Wi S i 
 ci   ci 
 
and hence,
2/3 2/3
W 2 S 2  2
ni   i i  , since   is constant.
 ci  
 

2/3  2 2 2/3
2  Wi S i 
ii) We have ni    (2)
  ci 
 
Taking summation over all strata, we get
2/3
k
2
2/3 k  Wi2 S i2  2
2/3
  c  n
 ni     
  
 

2/3
(3)
i 1 i 1  i  k W 2 S 2 
 i i 
 ci 
i 1 
Substitute equation (3) in equation (2), we get
2/3
n  Wi2 S i2 
ni    .
2/3  c 
k W S 
2 2  i 
  ic i 
i 1 i 
Therefore,
1000
n1   (2.56) 2 / 3  541 , n2  313 , and n3  146 .
2/3 2/3 2/3
(2.56)  (1.125)  (0.36)
39

Stratified random sampling for proportion


Theory for estimating population mean Y or population total Y on the basis of stratified
sampling with srswor and srswr in the strata can easily be applied to the estimation of a
population proportion say P by taking the population values of y ij as 1 or 0 according as
the unit belong to that class or possesses a particular character C , then
N
1 i
Yi   yij  Pi , proportion based on N i units (stratum proportion)
N i j 1

N
1 k i 1 k k
Y    ij N  i i  Wi Pi  P , over all population proportion
N i 1 j 1
y  N P 
i 1 i 1

n
1 i
yi   yij  pi , sample proportion based on ni units
ni j 1

N
1 i
 i2  
N i j 1
( yij  Pi ) 2  Pi  Pi2  Pi Qi , stratum variance of proportion based on N i units

N
1 i
Ni
S i2  
N i  1 j 1
( yij  Pi ) 2 
Ni 1
Pi Qi , stratum mean square of proportion based on N i

units
n
1 i ni
si2  
ni  1 j 1
( y ij  pi ) 2 
ni  1
pi qi , sample mean square of proportion based on ni units

Theorem: In stratified random sampling, wor , an unbiased estimate of the over all
k
population proportion is given by p st   Wi pi with its variance
i 1
k  N  ni  Pi Qi
V ( p st )   Wi2  i  , where pi is the sample estimate of proportion Pi in the
i 1  Ni  1  ni
i  th stratum.
Proof: Since sampling within each stratum is simple random sampling, so that E ( pi )  Pi ,
it follows that
k k
E ( p st )   Wi E ( pi )   Wi Pi  P . To obtain the variance, we have
i 1 i 1
k k 1 1  Ni
V ( p st )  E [ p st  E ( p st )] 2   Wi2 V ( p i )   Wi2    Pi Qi , as
i 1 i 1  n i N i  N i  1
sampling is srwor within each stratum.
k  N  ni  Ni k  N  ni 
  Wi2  i  Pi Qi   Wi2  i  Pi Qi / ni .
i 1  ni N i  Ni  1 i 1  Ni  1 
40

Corollary: If stratified random sampling is with replacement, then the variance is


k
V ( p st )   Wi2 Pi Qi / ni .
i 1
Theorem: In stratified random sampling, wor , an unbiased estimate of
k  N  ni  Pi Qi 1 k p q
V ( p st )   Wi2  i  is Vˆ ( p st )  v ( p st )   ( N i  ni ) Wi i i .
i 1  N i  1  ni N i 1 ni  1

1 k p q  1 k W n p q 
Proof: E [Vˆ ( p st )]  E   ( N i  ni ) Wi i i   E   ( N i  ni ) i i i i 
 N i 1 ni  1 N i 1 ni ni  1 

1 k W n p q 
  ( N i  ni ) i E  i i i
ni  ni  1

N i 1 

1 k W N PQ
 
N i 1
( N i  ni ) i i i i , since E ( si2 )  S i2 with srswor .
ni N i  1
k N n  2 Pi Qi
   i i  Wi .
i 1
N i  1  ni

Corollary: With stratified random sampling, wr , an unbiased estimate of


k k
V ( p st )   Wi2 Pi Qi / ni is Vˆ ( p st )   Wi2 pi qi / ni  1 .
i 1 i 1

\
41

UNIT-III

CLUSTER SAMPLING

In random sampling, it is presumed (to suppose) that the population has been divided into a
finite number of distinct and identifiable units called the sampling units. The smallest units
into which the population can be divided are called the elements of the population, and a
group of such elements is known as a cluster. After dividing the population into specified
cluster (as a simple rule, the number of elements in a cluster should be small and the number
of cluster should be large), the required number of clusters are then obtained either by the
method of equal or unequal probabilities of selection, such procedure, when the sampling
units is a cluster, is called cluster sampling. If the entire area containing the population
under study is subdivided into smaller area segments, and each element in the population is
associated with one and only one such area segment, the procedure is alternatively called
area sampling. There are two main reasons for using cluster as a sampling unit.
i) Usually a complete list of the population units is not available and therefore the use of
individual unit as sampling unit is not feasible.
ii) Even when a complete list of the population units is available, by using cluster as
sampling unit the cost of sampling can be reduced considerably.
For instance, in a population survey it may be cheaper to collect data from all persons in a
sample of households than from a sample of the same number of persons selected directly
from all the persons. Similarly, it would be operationally more convenient to survey all
households situated in a sample of areas such as villages than to survey a sample of the same
number of households selected at random from a list of all households. Another example of
the utility of cluster sampling is provided by crop survey, where locating a randomly selected
farms or plot requires a considerable part the total time taken for the survey, but once the plot
is located, the time taken for identifying and surveying a few neighbouring plots will
generally be only marginal.

Theory of equal clusters


Suppose the population consists of N clusters, each of M elements. A sample of n clusters
is drawn by the method of simple random sampling and every unit in the selected clusters is
enumerated. Let us denote by
yij , value of the j  th element in the i  th cluster, j  1,  , M ; i  1,  , N .

1 M
y i.   yij , mean per element of the i  th cluster.
M j 1

1 N
YN   yi. , mean of cluster means in the population of N clusters.
N i 1

1 N M
Y    yij , mean per element in the population.
NM i 1 j 1

1 n
yn   yi. , mean of cluster means in a sample of n clusters.
n i 1
42

1 n M
y   yij , mean per element in the sample.
nM i 1 j 1

Note: Y N  Y , and y n  y , if size of clusters are same.

1 M
S i2   ( yij  yi. ) 2 , mean square between elements within the i  th cluster.
M  1 j 1

1 N 2
S w2   S i , mean square within clusters.
N i 1

1 N
S b2   ( yi.  YN ) 2 , mean square between cluster means in the population.
N  1 i 1
N M
1
S2    ( yij  Y ) 2 , mean square between elements in the population.
NM  1 i 1 j 1

N M
1
E ( yij  Y ) ( yik  Y )
  ( yij  Y ) ( yik  Y )
NM ( M  1) i 1 j  k
 
E ( yij  Y ) 2 1 N M
  ( yij  Y ) 2
NM i 1 j 1

N M
  ( yij  Y ) ( yik  Y )
i 1 j  k
 , intracluster correlation coefficient between elements with
( M  1) ( NM  1) S 2
in clusters.
Theorem: A simple random sample, wor , of n clusters each having M elements is drawn
from a population of N clusters, the sample mean y n is an unbiased estimator of population
1 1  1 f 2
mean Y and its variance is V ( y n )     S b2  Sb .
n N  n
Proof: We have,
1 n  1 n 1 N
E ( y n )  E   yi.    E ( yi. )   yi.  YN  Y .
n 
 i 1  n i 1 N i 1

To obtain the variance, we have, by definition


2 2
1 n   n 
n
V ( y n )  E ( y n  Y N )  E   y i.  Y N
2   1 E   ( y i.  Y N ) 
n n  n 2  i 1 
 i 1  

1  n n 
  E ( y .  Y ) 2
  E ( yi.  YN ) ( yi .  YN ) 
n 2  i 1 
i N
i i 
43

Consider

1 N N 1 2
E ( yi.  YN ) 2  
N i 1
( y i.  Y N ) 2 
N
Sb . (3.1)

and
N
1
E ( yi.  YN ) ( yi .  YN )   ( yi.  YN ) ( yi .  YN )
N ( N  1) i  i 

1 N  N 
   ( yi.  YN )   ( yi .  YN )  ( yi.  YN )
N ( N  1) i 1 i  1 

1 N N N 
  ( yi.  YN )  ( yi .  YN )   ( yi.  YN ) 
2

N ( N  1) i 1 i  1 i 1 
N
1 1
 
N ( N  1) i 1
( yi.  YN ) 2   S b2
N
(3.2)

In view of equations (3.1) and (3.2), V ( y n ) reduces to

1  n N 1 2 n
 1  1  n ( N  1) 2 n (n  1) 2 
V ( yn )    S b     S b2    Sb  Sb 
n 2 i 1 N i  i 
N  n 2  N N 

N  n 2 1 f 2
 S  Sb .
nN b n
1 2
Note: For large N , V ( y n )  S .
n b
Alternative expression of V ( y n ) interms of correlation coefficient
Consider the intracluster correlation coefficient between elements within clusters and is
defined as
N M

E ( yij  Y ) ( yik  Y )
  ( yij  Y ) ( yik  Y )
i 1 j  k
 
2
E ( yij  Y ) ( M  1) ( NM  1) S 2

N M
   ( yij  Y ) ( yik  Y )  (M  1) ( NM  1)  S 2 .
i 1 j  k

By definition,

1 f 2 1 f N
V ( yn ) 
n
Sb  
n ( N  1) i 1
( y i.  Y N ) 2 (3.3)

Consider
44

2 2
N  
N
1 M 1 N  M 
 ( yi.  YN )    M  yij  M YN   2    ( yij  YN ) 
 
2 M
i 1 i 1 j 1  M i 1 j 1 

1  N M N M 
   ( yij  Y ) 2    ( yij  Y ) ( yik  Y )  , as YN  Y (3.4)
M 2  i 1 j 1 i 1 j  k


1
 [( NM  1) S 2  ( M  1)( NM  1)  S 2 ]
2
M

( NM  1) S 2
 [1  ( M  1)  ] (3.5)
M2
Substitute the values of equation (3.5) in equation (3.3), we get

1 f  ( NM  1) S 2 
V ( yn )    [1  ( M  1)  ] .
n  M 2 ( N  1) 
 
1 NM  1 N (M  1 / N ) 1
Note: For large N ,  0 , so that (1  f )  1 , and   .
N M 2 ( N  1) NM 2 (1  1 / N ) M
Hence,

S2
V ( yn )  [1  ( M  1)  ] .
nM
Corollary: Yˆ  NM y n is an unbiased estimate of the population total Y , and its variance
2
1 f  2 2  1  f  ( NM  1) S
V (Yˆ )  N 2 M 2   Sb  N   [1  ( M  1)  ]
 n   n  N 1
1  f  2
 N 2M   S [1  ( M  1)  ] , for large N .
 n 
Estimation of variance V ( y n )
Define,

1 n 1  n 2 
sb2  
n  1 i 1
( y i.  y n ) 2
 
n  1  i 1
yi.  n y n 2  , then

1   n 
E ( sb2 ) 
n 1
 E ( yi.2 )  n E ( yn 2 ) 
 i 1 
Note that,

V ( yi. )  E ( yi. 2 )  Y N 2 , so that

 N  1 2
E ( y i. 2 )   2
 S b  YN . (3.6)
 N 
and
45

V ( y n )  E ( y n 2 )  Y N 2 , so that

 N n 2
E ( yn 2 )   2
 S b  YN . (3.7)
 nN 

In view of equations (3.7), and (3.6), E ( sb2 ) reduces as

1   N  1 2  N  n 2 1  nN  n  N  n  2
E ( sb2 )   n  Sb  n   Sb   
2
 Sb  Sb .
n 1   N   nN   n  1  N 
1 f 2
This shows that sb2 is an unbiased estimate of S b2 . Hence v ( y n )  s is an unbiased
n b
1 f 2
estimator of V ( y n )  Sb .
n
Relative efficiency ( RE ) of cluster sampling
In sampling of nM elements from the population by simple random sampling, wor , the
variance of the sample mean y is given by
2
 NM  nM S 1 f 2 1 f 2
V ( y sr )     S , and V ( y n )  Sb .
 NM  nM nM n
Thus, the relative efficiency of cluster sampling compared with simple random sampling is
given by
V ( y sr ) S2
RE   . This shows that the efficiency of cluster sampling increases as the
V ( yn ) M S 2
b
mean square between clusters means S b2 decreases.

Note: For large N , the relative efficiency of cluster sampling in terms of intracluster
correlation coefficient  is given by
V ( y sr ) 1
RE   .
V ( y n ) 1  ( M  1) 
It can be seen that the relative efficiency depends on the value of  , if
i)   0 , then V ( y sr )  V ( y n ) , i.e. both methods are equally precise.
ii)   0 , then V ( y sr )  V ( y n ) , i.e. simple random sampling is more precise.
iii)   0 , then V ( y sr )  V ( y n ) , i.e. cluster sampling is more precise.
46

Estimation of relative efficiency of cluster sampling


We have,

Est. S 2
Est. ( RE )  , here s 2 will not be a unbiased estimate of S 2 i.e. E ( s 2 )  S 2 ,
M Est. S b2
because a sample of nM elements is not taken randomly from the population of NM
elements. To find unbiased estimate of S 2 , consider
N M N M
( NM  1) S 2    ( yij  Y ) 2    ( yij  yi.  yi.  Y ) 2
i 1 j 1 i 1 j 1

N M
   [( yij  yi. ) 2  ( yi.  Y ) 2  2 ( yij  yi. ) ( yi.  Y )]
i 1 j 1

N M N N
   ( y ij  yi. ) 2  M  ( y i.  Y ) 2  0  ( M  1)  S i2  M ( N  1) S b2
i 1 j 1 i 1 i 1

 N ( M  1) S w2  M ( N  1) S b2 . (3.8)

It can be seen that in a random sample of n clusters, sb2 and s w


2
will provide unbiased
estimates of S b2 and S w2 , respectively.

Define,
n M
1 1 n
2
sw    ( yij  yi. ) 2 , and
n ( M  1) i 1 j 1
sb2   ( y i.  y n ) 2 .
n  1 i 1

Consider

1 n M
1  n M n 
 2
2
sw    ij i. n (M  1)    ij
n ( M  1) i 1 j 1
( y  y ) 2
 y 2
 M  i.  , so that
y
 i 1 j 1 i 1 

1 n M n 
2
E (s w )   E ( y ij2 )  M  E ( y i2. )
n ( M  1) i 1 j 1 i 1 
 
Note that

V ( y ij )  E ( y ij2 )  Y N2 , then

( NM  1) 2 ( N  1) 2
E ( y ij2 )  S  Y N2 . Similarly, we can see, E ( y i2. )  S b  Y N2 .
NM N
Therefore,

1  n M  ( NM  1) 
n
 ( N  1) 2 
2
E (s w )    S 2  Y N2   M   S b  Y N2 
n ( M  1)  i 1 j 1 NM  i 1 
N 

47

1  ( NM  1) 2 ( N  1) 2 
  nM S  nM Y N2  nM S b  nM Y N2 
n ( M  1)  NM N 
1
 [( NM  1) S 2  M ( N  1) S b2 ]
N ( M  1)
1
 [ N ( M  1) S w2 ]  S w2 , by using relation, which is given in
N ( M  1)
equation (3.8).
and

E ( sb2 )  S b2 , as n clusters are drawn under srswor .

Thus, an unbiased estimate of S 2 will be


1
Sˆ 2  2
[ N ( M  1) s w  M ( N  1) sb2 ] .
NM  1
Therefore,
1 2
[ N ( M  1) s w  M ( N  1) sb2 ]
Est ( RE )  NM  1 .
M sb2

Note: For large N ,


1 2
[ N ( M  1) s w  M ( N  1) sb2 ]
N (M  1 / N )
Est. ( RE ) 
M sb2

1 2
[ N ( M  1) s w  NM (1  1 / N ) sb2 ] ( M  1) s 2  M s 2
NM w
  b
.
2 2 2
M sb M sb

Estimation of 
1
For large N , RE   E (say), so that
1  ( M  1) 
2
( M  1) s w  M sb2
Eˆ  ( M  1) Eˆ ˆ  1 , where Eˆ 
M 2 sb2

1 2
1 [( M  1) s w  M sb2 ]
1  Eˆ M 2 sb2 M 2 sb2  ( M  1) s w
2
 M sb2
or ˆ   
( M  1) Eˆ  1  2
( M  1) [( M  1) s w  M sb2 ]
( M  1)  2
[( M  1) s w  M sb2 ] 
 M 2 s2 
 b 

M ( M  1) sb2  ( M  1) s w
2
M sb2  s w
2
  .
2
( M  1) [( M  1) s w  M sb2 ] 2
( M  1) s w  M sb2
48

Alternative method
We have,

1 N M
  ( yij  Y ) ( yik  Y )
M  1 i 1 j  k
 ,and ( NM  1) S 2  N ( M  1) S w2  M ( N  1) S b2
2
( NM  1) S
.
Note that, from equation (3.4)
N N M N M
M 2  ( yi.  YN ) 2    ( yij  Y ) 2    ( yij  Y ) ( yik  Y )
i 1 i 1 j 1 i 1 j  k

N M N N M
or   ( yij  Y ) ( yik  Y )  M 2  ( yi.  Y ) 2    ( yij  Y ) 2
i 1 j  k i 1 i 1 j 1

 M 2 ( N  1) S b2  ( NM  1) S 2  M 2 ( N  1) S b2  N ( M  1) S w2  M ( N  1) S b2

 M ( N  1) S b2 ( M  1)  N ( M  1) S w2 .
Hence,

M ( N  1) S b2  N S w2
 .
M ( N  1) S b2  N ( M  1) S w2

It can be seen that in a random sample of n clusters, sb2 , and s w


2
will provide unbiased
estimate of S b2 , and S w2 respectively. Therefore, an estimator of  will be

M ( N  1) sb2  N s w
2
M sb2  s w
2
ˆ  , and for large N , ˆ  .
M ( N  1) sb2  N ( M  1) s w
2
M sb2  ( M  1) s w
2

Determination of optimum cluster size


The best size of the cluster to use depends on the cost of collecting information from clusters
and the resulting variance. Regarding the variance function, it is found that variability
between elements within clusters increases as the size of cluster increases (this means that
large clusters are found to be more heterogeneous than small clusters) and decreases with
increasing number of clusters. On the other hand, the cost decreases as the size of cluster
increases and increases with the number of clusters increases. Hence, it is necessary to
determine a balancing point by finding out the optimum cluster size and the number of
clusters in the samples, which can minimize the sampling variance for a given cost or,
alternatively, minimize the cost for a fixed variance.
i) The cost of a survey, apart from overhead cost, will be made up of two components.
ii) Cost due to expenses in enumerating the elements in the sample and in travelling
within the cluster, which is proportional to the number of elements in the sample.
iii) Cost due to expenses on travelling between clusters, which is proportional to the
distance to be travelled between clusters. It has been shown empirically that the
49

expected value of minimum distance between n points located at random is proportional


to n .
The cost of a survey can be, therefore expressed as
C  c1nM  c 2 n ,
where c1 is the cost of collecting information from an element within the cluster and c2 is
the cost per unit distance travelled between clusters. In various agricultural surveys it has
been observed that S w2 is related to M by the relation S w2  a M g , g  0 , where a and g
are positive constant, then

( NM  1) S 2  N ( M  1) aM g
Sb2   S 2  ( M  1) aM g 1 , for large N .
M ( N  1)
Thus, the variance V ( y n ) for large N , reduces as
1 2
V ( yn )  [ S  ( M  1) a M g 1 ] .
n
The problem is to determine n and M such that for specified cost, the variance of y n is a
minimum. Using calculus methods we form
  V ( y n )   (c1nM  c2 n  C ) ,
where  is an unknown constant. Differentiating with respect to n and M respectively, and
equating the results to zero, we obtain
 1  c 
0 [ S 2  ( M  1) a M g 1 ]    c1M  2  , so that
n n 2
 2 n
1  c 
V ( y n )    c1M  2 
n  2 n
(3.9)
and
 
0 V ( y n )   c1n , so that
M M

V ( y n )   c1n .
M
(3.10)
On eliminating  from equation (3.9) and (3.10), we have
1  c1n
V ( yn )   or
1 M  c 2 
V ( yn )  c1M  
n  2 n
1  c1
V ( yn )  
V ( y n ) M  c2 
c1M 1  
 2 
 c 1 M n 
50

M  1
or V ( yn )  
V ( y n ) M c2
1
2c1M n

Now solving, c1n M  c 2 n  C  0 as a quadratic in n , we have

 c 2  c 22  4 c1M C 4 c1 M C  4 c1 MC 
n or 2 c1M n  c 2  c 2 1   
 c2  1   1
2 c1M 2  2 
c2  c2 
Hence,
1 / 2
 1  4 c MC 
 1  1
M
V ( yn )   1. (3.11)
V ( y n ) M c2  c 22 
1 
 4 c MC 
 
c2  1  1  1
 2 
 c2 
Now, solve LHS of equation (6.11), we have
M  M 
V ( yn )  [ S 2  ( M  1) a M g 1 ]
V ( y n ) M n V ( y n ) M
1
 [ agM g  a ( g  1) M g 1 ] .
nV ( yn )
Therefore,
1 / 2
aM g 1 [ gM  ( g  1)]  4 c MC 
 1  1  1 (3.12)
nV ( yn )  c 22 

It is difficult to get an explicit expression for M . However, M can be obtained by the
iterative method (trial and error method). On substituting the value of M thus obtained in
equation (3.12), we can obtain the optimum value of n .
It is evident from equation (3.12) that the optimum size of the unit becomes smaller when
i) c1 increases i.e. time of measurement increases.
ii) c2 decreases i.e. travel become cheaper.
iii) total cost of survey C increases.
Cluster sampling for proportion
If it is desired to estimate the proportion P of elements belonging to a specified category A
when the population consists of N clusters, each of size M and a random sample, wor , of
n clusters is selected. Defining yij as 1 if the j  th element of the i  th cluster belongs to
M
the class A and 0 otherwise, it is easy to note that ai   y ij gives the total number of
j 1
51

a
elements in the i  th cluster that belong to class A , and pi  i is the proportion in the i 
M
th cluster. Hence the proportion P is

1 N M 1 N 1 N
P   ij NM  i N  pi .
NM i 1 j 1
y  a 
i 1 i 1

n
1
An unbiased estimate of P is Pˆ   pi  p
n i 1

and
N
1 1  1 N n N
V ( p)     
 n N  N  1 i 1
( pi  P) 2 
2  ( p i  P ) 2 , for large N .
N n i 1
n
1 1  1
As an estimate of V ( p ) we may use Vˆ ( p )     
 n N  n  1 i 1
( pi  p ) 2 .

Alternatively, if we take a simple random sample, wor of nM elements from the population
 NM  nM  PQ  n  PQ
of size, NM , the variance of p is V ( p )     1   , for large N .
 NM  1  nM  N  nM

Theory of unequal clusters


There are a number of situations where the cluster size vary from cluster to cluster, for
example, villages or urban blocks which are groups of households, and households, which are
groups of persons are usually considered as clusters for purposes of sampling, because of
operational convenience.
Suppose the population, consisting N clusters of size M 1 , M 2 ,  , M N such that
N
 M i  M 0 . A sample of n clusters is drawn by the method of simple random sampling,
i 1
wor , and all elements of the clusters surveyed. Let us denote by
yij , value of the j  th element in the i  th cluster, j  1, 2,  , M i ; i  1, 2,  , N .

M
1 i
y i.  
M i j 1
y ij , mean per element of the i  th cluster.

1 N
YN   yi. , mean of the cluster means in the population of N clusters.
N i 1

1 n
y n   y i. , mean of the cluster means in the sample of n clusters.
n i 1

N Mi
1 1 N
Y 
N   yij  M  M i yi. , mean per element in the population.
0 i 1
 M i i 1 j 1
i 1
52

1 N M
M  
N i 1
M i  0 , mean of cluster size.
N

Three estimators of population mean Y , that are in common use may be considered.
1 n
1st estimate: It is defined by the sample mean of clusters means as y I   y i.  y n .
n i 1

By definition,
1 n  1 N
E ( y I )  E   yi.    yi.  Y N  Y , as the sampling is sr .
n 
 i 1  N i 1
Thus, y I is biased estimator of the population mean Y .
The bias of the estimator is given as
1 N 1 N 1 N 1 N
B  E( y I )  Y  
N i 1
y i.  
M 0 i 1
M i y i.   y i. 
N i 1
 M i y i.
N M i 1

1  N N  N
  M  y i.   M i y i.    1  ( M i  M ) y i .
NM   NM i 1
 i 1 i 1 

1 N 
   ( M i  M ) ( y i.  Y N  Y N ) 
N M i 1 

1 N  1 N
  i ( M  M ) ( y i.  Y N 
)   (M i  M ) YN
NM  i 1  N M i 1
1
 Cov ( y i. , M i ) .
M
This shows that bias is expected to be small when M i and yi. are not highly correlated. In
such a case, it is advisable to use this estimator.
Its variance is given by
1 f 2 1 N
V ( yI )  E ( y I  YN ) 2 
n
Sb , where Sb2  
N  1 i 1
( yi.  YN ) 2

and an unbiased estimator of V ( y I ) is

1 f 2 1 n
v( y I ) 
n b
s , where sb2  
n  1 i 1
( y i.  y I ) 2 .

1 n
2nd estimate: It is defined as y II   M i y i. .
nM i 1

By definition,
53

1 n 1  1 N  1 N
E ( y II )   i i. M  N  i i .  NM  M i y i.  Y ,
nM i 1
E ( M y )  M y   as srwor .
 i 1  i 1

This shows that y II is unbiased estimate of Y . Its variance is given by

 1 n   1 n M i y i. 
 nM  i i. 
V ( y II )  V  M y   V   .
n M 
 i 1   i 1 
Define, a variate
M i y i.
ui  , i  1, 2,  , N .
M
Let u and U be the sample and population means of variable u , respectively, where,

1 n M i y i. 1 N M i y i. 1 N
u 
n i 1 M
 y II , and U  
N i 1 M
  M i y i.  Y .
M 0 i 1

Therefore,
1 f 2
V ( y II )  V (u )  S b , as clusters are randomly drawn wor .
n
2
1 N 1 N  M i y i. 
where, S b 2  
N  1 i 1
(u i  U ) 2
  
N  1 i 1  M
Y 

and an unbiased estimator of V ( y II ) is
2
1 f 2 1 n  M i y i. 
v( y II ) 
n
su , where su2   
n  1 i 1  M
 y II  .

n
1
3rd estimator: It is defined as y III   M i y i. . This estimate is a ratio estimate of
 M i i1
i
1
the form Rˆ   yi , and its variance is given by replacing xi by M i and yi by M i yi.
 xi i
i
N
1 f
in the variance of ratio estimator, where, V ( Rˆ ) 
2  ( y i  R xi ) 2 , and
n ( N  1) X i 1
2
1 N 
X 2
   M i   M 2 . Hence,
N 
 i 1 
2
   
   
1 f N   1 N  
2   i i.  M i y i.  M i 
V ( y III )  M y 
N
n ( N  1) M i 1    
 Mi
i 1
  
  i 1  
54

N
1 f
2
 ( M i yi.  Y M i ) 2
n ( N  1) M i 1
2
1 f N Mi  1 f
  
n ( N  1) i 1  M
( y i.  Y )  
 n
S b 2 ,

2
1 N Mi 
where S b 2   
N  1 i 1  M
( y i.  Y )  .

An unbiased estimate of V ( y III ) is given by
n M 2
1 f 2 1  i 
v ( y III ) 
n b
s  , where sb 2   
(n  1) i 1  M
( y i.  y III ) .

Cluster sampling with varying probabilities and with replacement
Theorem: If a sample of n clusters is drawn with probabilities proportional to size, i.e.
M
pi  M i or pi  i and with replacement, then an unbiased estimate of Y is given by
M0
1 n 1 N M
yn  
n i 1
yi. with variance V ( y n )   i ( yi.  Y ) 2 .
n i 1 M 0

Proof: By definition,
1 n  1 n 1 n N  1 N
E ( y n )  E   y i.    E ( y i. )     p i y i.  
 M 0  i i.
M y Y .
n  n n 
 i 1  i 1 i 1 i 1  i 1

This shows that y n is an unbiased estimator of Y .

To obtain the variance of y n , we have

V ( y n )  E [ y n  E ( y n )] 2  E ( y n2 )  Y 2 . (3.13)
Consider
2
1 n  1  n n n 
E ( y n2 )  E   y i.    E ( y 2
. )    E ( yi. ) E ( yi . ) 
n  n 2  i 1
i 
 i 1  i 1 i   i 1 

1  N M i 2 
 n yi.  n (n  1) Y 2  , since i  th cluster is drawn with
n 2  i 1 M 0 

Mi
probability , and sampling of clusters are wr , i.e. E ( y i. )  Y  E ( y i. ) .
M0

1  N M i 2 
 
n  i 1 M 0
yi.  (n  1) Y 2  .

(3.14)

In view of equations (3.14) and (3.13), we get
55

1 N Mi 2 1 N M 1
V ( yn )  
n i 1 M 0
yi.  (n  1) Y 2  Y 2   i ( yi.  Y ) 2   b2 , (say).
n i 1 M 0 n

Estimation of V ( y n )
Define,

1 n
sb2   ( yi.  y n ) 2 , then
n  1 i 1

1  n  1  n  N M i 2 
E sb2   i.
n  1  i 1
E ( y 2
)  n E ( y 2 
)
n  
n  1


   M i. y  n V ( y n )  n Y 2


 i 1 i 1 0 

1   N M i 2  
 n  y i.  n Y 2   n V ( y n ) 
n  1   i 1 M 0 
 

1  N M i 2 
  1 (n  2   2 )   2 .
 n ( y i.  Y ) 2  n b
n  1  i 1 M 0 n  n 1 b b b
 
1
This shows that sb2 is an unbiased estimate of  b2 . Therefore, Vˆ ( y n )  sb2 is an unbiased
n
estimate of V ( y n )   b2 / n .

You might also like