0% found this document useful (0 votes)
38 views

STAT 366 - Sample Survey Theory and Methods II - Lecture 2

Uploaded by

essilfieobed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

STAT 366 - Sample Survey Theory and Methods II - Lecture 2

Uploaded by

essilfieobed
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

STAT 366

Sample Survey Theory and Methods II


Lecture 2: Stratified sampling
Dr. S.K. Appiah
Department of Mathematics
[email protected]/
[email protected]
Jan 2014 020 527 9926
3.0 STRATIFIED SAMPLING

• 3.1 Introduction

• 3.2 Properties of the mean and proportion estimators

• 3.3 Types of Allocations

• 3.4 Estimation of sample size

• 3.5 Examples

1
3.1 Introduction
• Heterogeneous Population of size N is divided into more
homogeneous sub-populations of sizes .

• These sub-populations are called strata.

• Independent samples of sizes are drawn from


each stratum to make the grand sample of size n.

2
Characteristics of Stratified Samples
• Population is divided into an exclusive and exhaustive
set of strata, using some external sources.
• Within each stratum a separate random sample is
selected.
• For each stratum, parameters/statistics are computed
and properly weighted to form an overall estimate for
the whole population.
• The statistic may be the mean or the variance.
3
Reasons for Stratification
• If data of known precision is wanted for certain
subdivisions, it is advisable to treat each subdivision as a
population in its own right.

• Administrative convenience may dictate the use of


stratification.

• Sampling problems may differ markedly in different

4
parts of the population.
• Stratification may produce more precise estimates of
population characteristics.

• More representative sample is obtained, thus guarding


against obtaining an extreme sample

5
Notations
Suppose
• a population of N units is divided into k strata.
For each stratum h we define the following:
 yh = value obtained for the ith unit for stratum h
i

1 N
 Yh 
h

 yh = true subpopulation mean for stratum h.


N h i 1 i

1 n
 yh   yh = sample mean for stratum h.
h

nh i 1 i

6
• Nh
 Wh  = weight of stratum h
N
nh
 fh  = sampling fraction in stratum h.
Nh

  = true variance for stratum h


2
1 k
 Sh 
2
 y hi  Yh
N h 1 h 1

 
2
1 k
 sh 
2
 y hi  y h = sample variance for stratum h
n h 1 h 1

1 k N h
1 k k
 Y    N h yh   N hYh   W hYh = overall mean.
7
N h 1 i 1 N h 1i
h 1
3.3 Properties of Estimators
• The estimator for the population mean in stratified
sampling is y st , which is defined as:
1 k k
yst   Nh yh  Wh yh
N h1 h1

where N  N1  N2 ... Nk

1k
Note: yst = y   nh yh , if and only if fh  f
n h1
8
SA1
• Theorem:

(a) If y h is unbiased estimator for Yh in every stratum, then


the stratified sample mean, y st is an unbiased estimator
of the population mean, Y

• (b) If the samples are drawn independently in different strata,


then the variance of y st is

1 k 2 k
Var( yst )  2  NhVar( yh )   Wh2Var( yh )
N h1 h1
9
Slide 10

SA1 Simon Appiah, 17/12/2016


• Corollary:
If the sampling within stratum is by the sample random
sampling then

•  N h  nh  S h2
Var ( yh )  

 N h  nh
1 k 2  N h  nh  S h2
 Var ( yst )  2  N h  
N h 1  N h  nh
k  N n  S2
  Wh  h h  h
2
h 1 

N h  nh
k  nh  S2
  Wh 1
2   h
h 1  Nh n
  h
k W 2S 2
h h  1
k
   W h S h2
10 h 1 nh N h 1
nh
If the sampling fraction fh  is negligible then
Nh
k Wh2Sh2 1 k
 Var( yst )    WhSh2
h1 nh N h1
1 k Nh2Sh2 k Wh2Sh2
 Var( yst )  2  
N h1 nh h1 nh

11
•  If Yˆ  N y is the estimate of the population total Y , then
st

Var (Yˆ )  Var ( N y st )  N 2Var ( y st )


 An unbiased estimator for Var ( y st ) is

1 k 2 nh  s h2
Var ( y st )  2  N h  1 
N h 1  Nh  nh
2
k s
  W h2 1 f h  h
h 1 nh
nh
1 2
where sh2   hi ( y  y h ) an unbiased estimator for S 2
h
n h 1 i 1
12
3.4 Allocation of Sample Size to Strata
Factors considered in sample size allocation are:
• The total number of units in each stratum.
• The variability of observations within each stratum.
• The cost of obtaining an observation from each
stratum.
• Methods of Allocation:
• Equal Allocation
• Proportional Allocation
• Optimum Allocation

13
Equal Allocation
• This is the easiest but not the best way of sample
allocation.
• For k strata, if we want a sample of size n, then we select
in each stratum a sample size , such that we have
equal allocation.
• This is usually done for administrative convenience and if
nothing is known about the stratum variances.
14
•  2
1 k n h  S
V a r ( y st )   N 2 1 
h  
h
N 2 h 1 
 N h  nh
 2
1 k n h  kS
V a r ( y st )  2  N 2 1 
h  
h
N h 1 
 N h  n
k k 1 k
V a r ( y s t )   N h2 S h2  2 h
N S 2
n h 1 N 1
h h

k k 1 k
V a r ( y s t )   W h2 S h2  W
 h h S 2
n h 1 N h 1
15

Proportional Allocation
• For this allocation, the sample size from a stratum is

proportional to the population size of that stratum.

There is a uniform sampling fraction in every stratum

n Nh
 Hence nh  .N h  .n  Wh.n ,
N N

 And Var ( yst )  (1  f ) k


N S 2  (1  f ) k
W S 2
nN h1 h h n h1 h h
16

Optimum Allocation
• In this allocation the sample sizes are selected to:

(i) minimize the variance, for a specific cost


of taking the sample, or

(ii) minimize the cost for a specific values of .

17
• If within any stratum the cost is proportional to the
size of sample but the cost of taking measurement
on each unit varies from stratum to stratum, then
we take .

• This means that in a given stratum, we take a


larger sample if:

18
• the stratum is larger.

• the stratum is more variable internally.

• sampling is cheaper in the stratum.

19
• Suppose we select to minimize for specified cost or
minimize the total cost for specified The simplest cost
function which is linear is of the form:
k
C  c0   c h n h
h 1
• C=total cost,
• =overhead cost ,
• =cost per unit
20
• If travel costs between units are substantial,

empirical and mathematical studies suggest that

travel costs are better represented by

where is the travel cost per unit.


21
Theorem:
• In stratified random sampling with a linear cost
function as given above the variance of estimated
mean, is minimum for a specified cost C and the
cost is minimum for a specified variance ,
when is proportional to (or ).
• The Lagrange multiplier used to minimize it.

22
• Neyman Allocation
• An important special case arises if the cost per unit is
the same in all strata ( ) . The total cost thus
becomes an optimum allocation for fixed
cost and reduces to optimum allocation for fixed
sample size. The result in this special case is as follows:

23
Theorem:

• In stratified random sampling Var( ) is minimized for a


fixed total size, n if

24
• This is called Neyman allocation. The minimum variance

with fixed is

25
3.5 Relative Precision of Stratified Random and
Simple Random Sampling

• We compare the precision of simple random sampling


and stratified random sampling with proportional and
optimum allocation. This comparison shows how the
gain due to stratification is achieved. The variances of
the estimated mean are denoted by

respectively.
26
Theorem:

• If we assume that { }are so large that terms in

and and are ignored, then

• ,
• where the optimum allocation is for fixed

27
Example 2.3

The farms in a district are divided into three strata. The


annual yield (in bags) for the farms are recorded. The
population sizes and the random samples taken are
shown in the table below.

28
Stratum h (yield in
bags)
1 26 3 92,105,82

2 35 4 38,47,52,59

3 53 5 27,20,21,22,30

29
i. Estimate the total annual yield and its standard
error.

ii. Obtain a 95% confidence interval for the total


annual yield of the farms.

30
Solution:
Stratum
h
1 26 3 93 133.0 2418

2 35 4 49 78.0 1715

3 53 5 24 18.5 1272

Total 114 12 - - 5405


31
i. The estimator for the annual mean yield,

The total annual yield is given by

32
The estimate of the variance,

33
• Hence the Variance of ,

From which we obtain the standard errors for and

• and

34
(ii) The 95% confidence interval for ,

• Thus

35
Example 2

Suppose a population is stratified into three strata


with pertinent information regarding size, variability,
mean and cost (in $100) of sampling per unit within
each stratum shown in the table below:

36
Stratum, h

1 1800 10 15 2.0

2 1200 24 29 1.0

3 1000 15 30 40

37
i. If a random sample of size 100 is selected, estimate the
population total, Y.

ii. Allocate the sample size of 100 to the three strata


proportionally and compute the estimated variance of
the population mean.

iii. Distribute the sample among the three strata in


accordance with the principle of optimum Neyman
allocation. Compute the estimated variance, .
38
iv. If it is desired to estimate the population mean
with variance not exceeding 1.5, determine
the minimum total sample size under optimum
Neyman allocation.

v. If the total cost function is of the form


. Find for a sample of size 100 and the
total field cost.
39
Solution
(i) The estimate of is given by

Hence the required population total is;

40
(ii) Allocating proportionally we have

• •

• •

41
The required variance,

42
(iii) For the Neyman allocation:

43
The required Variance

Hence
44
iv. Given that

as the minimum sample required


45
v. Given the total cost function, and for the
minimum we have

46
Minimizing G, we have

47
• Hence the required total cost;

48
3.6 Estimation of Population Proportion

• To estimate the proportion or percentage of units


in the population that falls into some defined class,
C. We try to construct strata in such a way that the
proportion in class C varies as much as possible
from stratum to stratum.

49
• Let Ah denote the total number of units in C in stratum h.
Then the proportion of units in C in this stratum,

• which is estimated by

• The estimate of population proportion in C, P is defined


by


50
• Theorem:
• The variance of for stratified random sampling is

• When fpc is ignored:


51
• If proportion allocation is carried out we have

• The estimate of the Var is obtained by

substituting for


52
3.7 Allocation of Sample for Estimating P
The best choice of stratum size in order to minimize
follows from the general theorem of optimum
allocation earlier discussed.

• Minimum variance for fixed total sample size


53
• Minimum variance for fixed cost, where cost

•n

54
3.8 Estimation of Sample Size

We shall consider two cases: estimation of or or P.

Estimating for Population Mean :


Let be the estimate of and , implies .
Then the desired precision.

55
• If fpc is ignored, we have, as first approximation,

• If (fpc not negligible), we have

which may take various forms including:

56
Optimum allocation for fixed

• where for ignoring fpc.

57
Proportional allocation.

• where , for ignoring fpc.

58

Estimation of Population Total, Y:

• for

o General :

59
o Proportional Allocation:

,for ignoring fpc

o Optimum Allocation :

o
60
• Estimation of Population Proportion P

o For proportional Allocation,

61
o For optimum allocation:

Where is the first approximation which ignores fpc.

62
Example 2.3

• It is desired to estimate the total enrolment of


schools in a district with coefficient of variation of
4.5%. The population of the 125 schools are
arranged in six strata. Estimates of were
computed for previous year enrolment of 52,852.

63
Strata 1 2 3 4 5 6

20 21 25 18 32 9

190 200 120 200 110 300

64
i. Compute the variance of the mean enrolment
under the optimum Neyman allocation. Assume
a sample size of 100.

ii. Find the required sample size for estimating the


total enrolment under the optimum Neyman
allocation. Find for each h.

65
Stratum
1 20 190 3800 722000 30.4 5776 8
2 21 200 4200 840000 33.6 6720 9
3 25 120 3000 360000 24.0 2880 6
4 18 300 5400 1620000 43.3 12960 12
5 32 110 3520 387000 28.16 3097.6 8
6 9 300 2700 810000 21.6 6480 6
Total 125 - 22620 4739000 180.96 37913.6 49
66
i. The variance of under Neyman optimum
allocation,

67
• (ii) Given the coefficient of variation,

and the total enrolment,

68
• For the required sample size n,

69
The allocation of sample sizes


70
Example 2

A survey with three strata is planned to estimate the


percentage of families who have accounts in
savings banks and the average amount invested
per family. Advance estimates of the percentages
and the within-stratum for the amount
invested are as follows.

71
Stratum

1 0.6 0.2 90

2 0.3 0.4 180

3 0.1 0.7 520

72
(a) Compute the smallest sample sizes, if

(i) the percentage of families is to be estimated with


standard error of 2.0%

(ii) the average amount invested with standard error of


$5.0. Under proportional allocation.

(b) Estimate the variance of the percentage of families


using a total sample of 500 under proportional allocation.
73
Stratum

1 0.6 0.2 90 0.096 4860 284

2 0.3 0.4 180 0.072 9720 142

3 0.1 0.7 520 0.021 27040 47

Total 1.0 - - 0.189 41620 473


74
(a)(i) Find n under proportional allocation,

• Ignoring since N is unknown,

75
Hence from ;

76

77
• which gives the following allocations:

78
if fpc is ignored.

79
Thank You

For any concerns, please contact


[email protected]
[email protected]
0322 191132
Nov 2016

You might also like