0% found this document useful (0 votes)

2 views

slidesc53_2

Stratified sampling involves dividing a population into non-overlapping subpopulations (strata) and selecting a simple random sample without replacement from each stratum. This method enhances precision, reduces variance in estimates, and allows for separate parameter estimation for each stratum. The document also includes examples and R code for implementing stratified sampling and discusses sample size allocation methods.

Uploaded by

Ki Yan Shih

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

slidesc53_2

Uploaded by

Ki Yan Shih

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

STAC53

Stratified Sampling
References: Sampling Design and Analysis, S.L. Lohr (Chap 3)

1
Stratified sampling
• In stratified sampling we divide the population into distinct
non-overlapping subpopulations called strata. Then select
SRSWOR from each stratum. The SRSs in the strata are
selected independently.

2
Stratified sampling : Introduction
• We use stratified sampling for one or more of the following reasons:
• To protect from the possibility of underrepresenting some parts of the
population.
• Higher precision (smaller variance) of estimates for population means and totals.
• Administrative convenience.
• With stratified random sampling, in addition to the estimates of population
parameters, we can also estimate the parameters for each stratum.

3
Example
• A Statistics class had two lecture sections (LEC01 and LEC02).
The data are given in le students.csv. Use R sampling package
to select a stratified sample of 15 students, using the two
lecture sections as strata, selecting a SRSWOR of ten
students from LEC01 and five students from LEC02.

4
students <- read.csv("students.csv", header=1)
head(students)
GIVEN_NAME LEC
1 Yan LEC_01
2 Prateek LEC_01
3 Adeel LEC_01
4 Jingya LEC_02
5 Anuradha LEC_02
6 Bryan LEC_01

5
students=read.csv("students.csv", header=1)
library(sampling)
s <- strata(students,
stratanames=c("LEC"),size=c(10,5), method="srswor",
description=T)
Stratum 1
Population total and number of selected units: 186 10
Stratum 2
Population total and number of selected units: 208 5
Number of strata 2
Total number of selected units 15

6
•s

7
• getdata(students,s)

8
Notations and some basic results

9
Notations and some basic results
• 𝑦ℎ𝑗 = measurement on the 𝑗𝑡ℎ unit in stratum ℎ
1 N
• Population mean in stratum ℎ: 𝑦തℎ𝑈 = σ𝑗=1 ℎ
𝑦ℎ𝑗
Nℎ
1 𝐻 Nℎ 𝑡
• Population mean = 𝑦ത𝑈 = ℎ=1 𝑗=1 𝑦ℎ𝑗 =
σ σ
𝑁 𝑁
Nℎ
• where 𝑡 = σ𝐻 σ
ℎ=1 𝑗=1 𝑦ℎ𝑗 is the population total.
• Note that: 𝑦ത𝑈 = σ𝐻
ℎ=1 𝑊ℎ 𝑦തℎ𝑈
𝑁ℎ
• where 𝑊ℎ = are the stratum weights.
𝑁
1 𝑁ℎ 2
• Population variance in stratum ℎ: 𝑆ℎ2 = σ 𝑦ℎ𝑗 − 𝑦തℎ𝑈
𝑁ℎ −1 𝑗=1
σNℎ
1 2
• Population variance : 𝑆2 = σ𝐻 𝑦ℎ𝑗 − 𝑦ത𝑈
𝑁−1 ℎ=1 𝑗=1

10
Sum of Squares Decomposition

11
Sample statistics using SRS estimators within each stratum

12
Result
• Under stratifies sampling (with SRSWOR within each stratum)

13
Result (Confidence intervals)
• Under stratified sampling (with SRSWOR within each stratum)

14
Example

15
Example

16
R code for estimating the population mean and the CI based on a stratified sample
• str.mu.est <- function(N_h,y,details="no", conf.level) {
• # N_h is a vector of the stratum sizes
• # y is a list object with each component being a stratum sample
• N <- sum(N_h)
• n_h <- unlist(lapply(y,length))
• f_h = n_h/N_h
• fpc <- 1-f_h #Finite population correction
• ybar_h <- unlist(lapply(y,mean))
• yvar_h <- unlist(lapply(y,var))
• W_h= N_h/N
• ybar_str <- sum(W_h*ybar_h)
• vybar_str <- sum(W_h^2*fpc*yvar_h/n_h)
• ME_CI <- qnorm((1+conf.level)/2)*sqrt(vybar_str)
• LCL = ybar_str - ME_CI
• UCL = ybar_str + ME_CI
• if(details=="no") {
• cbind(ybar_str,vybar_str,LCL, UCL)}
• else{
• cbind(ybar_str,vybar_str,LCL, UCL,n_h,ybar_h,yvar_h)}
• }

17
Let’s make it bigger

str.mu.est <- function(N_h,y,details="no", conf.level) {

# N_h is a vector of the stratum sizes
# y is a list object with each component being
# a stratum sample
N <- sum(N_h)
n_h <- unlist(lapply(y,length))
f_h = n_h/N_h
fpc <- 1-f_h #Finite population correction
• ybar_h <- unlist(lapply(y,mean))
• yvar_h <- unlist(lapply(y,var))
• W_h= N_h/N

18
• ybar_str <- sum(W_h*ybar_h)
• vybar_str <- sum(W_h^2*fpc*yvar_h/n_h)
• ME_CI <- qnorm((1+conf.level)/2)*sqrt(vybar_str)
• LCL = ybar_str - ME_CI
• UCL = ybar_str + ME_CI
• if(details=="no") {
• cbind(ybar_str,vybar_str,LCL, UCL)}
• else{
• cbind(ybar_str,vybar_str,LCL, UCL,n_h,ybar_h,yvar_h)}
• }

19
Some useful notes
• NOTES:
• unlist: given a list structure x, it produces a vector which contains all
the individual components of x.
• lapply(X,FUN) returns a list of the same length as X, each element of
which is the result of applying a function to the corresponding
element of X

20
To use the function enter the info as follows
N_h <- c(155,62,93)
str1 <- c(35, 28, 26, 41, 43, 29, 32, 37, 36, 25, 29,
31, 39, 38, 40, 45, 28, 27, 35, 34)
str2 <- c(27, 4, 49, 10, 18, 41, 25, 30)
str3 <- c(8, 15, 21, 7, 14, 30, 20, 11, 12, 32, 34, 24)
data <- list(townA=str1,townB=str2,rural=str3)
str.mu.est(N_h=N_h, y=data, details="yes", conf.level =
0.95)

21
str.mu.est(N_h=N_h, y=data, details=“no",
conf.level = 0.95)

• Ex: Do these calculations by hand.

22
Stratified sampling for proportions
• Proportions are in fact means of Bernoulli (i.e. 0’s and 1’s) random
variables and so we can use the same formulas with
• 𝑦തℎ = 𝑝Ƹ ℎ
𝑛ℎ
• 𝑠ℎ2 = 𝑝Ƹ ℎ (1 − 𝑝Ƹ ℎ )
𝑛ℎ −1
• 𝑝 = σ𝐻 ℎ=1 𝑊ℎ 𝑝ℎ , where 𝑊ℎ = 𝑁ℎ /𝑁
• 𝑝Ƹ 𝑠𝑡𝑟 = σ𝐻ℎ=1 𝑊ℎ 𝑝Ƹ ℎ
𝑝ℎ (1−𝑝ℎ )
• 𝑉(𝑝Ƹ 𝑠𝑡𝑟 ) = σ𝐻 2
ℎ=1 ℎ (1
𝑊 − 𝑓ℎ )
𝑛ℎ
• where 𝑝ℎ is the population proportion in the ℎ𝑡ℎ stratum.
𝑝ොℎ (1−𝑝ොℎ )
෠ 𝐻 2
• 𝑉(𝑝Ƹ 𝑠𝑡𝑟 ) = σℎ=1 𝑊ℎ (1 − 𝑓ℎ )
𝑛ℎ −1
23
Exercise
• A researcher is studying the effectiveness of a new educational program across three
different schools, which represent distinct strata based on their socioeconomic status:
low, medium, and high. The population sizes (𝑁ℎ ) for each school are as follows:
• Low SES: 200 students
• Medium SES: 300 students
• High SES: 500 students
• The researcher decides to sample 20 students from each school (𝑛ℎ = 20 for all strata).
After conducting the survey, the observed proportions of students favoring the program
are:
• Low SES: 0.60
• Medium SES: 0.70
• High SES: 0.80
1.Calculate the overall estimated proportion of students favoring the program across all
schools.
2.What is the standard error (SE) of the estimated proportion?

24
25
26
Sample size allocation
• We consider two commonly used methods for allocating the sample
sizes into each of the strata: proportional allocation, and optimal
allocation for a given n, the total sample size.
• Proportional Allocation

27
Sample size allocation
𝑁ℎ
• Substituting 𝑛ℎ = 𝑛 in the formula for 𝑉(𝑦ത𝑆𝑡𝑟 ), we get the
𝑁
following result:

• Result: Under stratified sampling with proportional allocation,

28
Sample size allocation

29
Example
• A Statistics class had two lecture sections (LEC01 and LEC02).
The data are given in file students.csv. Use R sampling
package to select a stratified sample of 20 students using
proportional allocation.

30
#R code for selecting a Statified sample with proportional
# allocation using sampling package
n = 20 # The sample size
students=read.csv("students.csv", header=1)
head(students)
GIVEN_NAME LEC
1 Yan LEC_01
2 Prateek LEC_01
3 Adeel LEC_01
4 Jingya LEC_02
5 Anuradha LEC_02
6 Bryan LEC_01
Nh = table(students$LEC)
Nh
LEC_01 LEC_02
186 208
N = nrow(students)
N
[1] 394
31
library(sampling)
set.seed(123)
s <- strata(students,stratanames=c("LEC"),size=round((Nh/N)*n),
method="srswor", description=T)
Stratum 1
Population total and number of selected units: 186 9
Stratum 2
Population total and number of selected units: 208 11
Number of strata 2
Total number of selected units 20

32
s

33
getdata(students,s)

34
Optimal Allocation
• The objective in optimal allocation is to minimize the variance 𝑉(𝑦ത𝑆𝑡𝑟 )
for a fixed cost
• Let 𝐶 represent total cost, 𝑐0 represent overhead costs, and 𝑐ℎ
represent the cost of taking an observation in stratum ℎ
• Then

35
Result( Optimal Allocation)

36
Note
• This result shows that in a given stratum, the sample size is
larger when
• The stratum size is larger
• Within stratum variability is higher
• The cost is lower in the stratum

37
Sprcial case 𝑐ℎ = 𝑐

38
Sample size for proportions

39
40
41

Download Full Growth Modeling Structural Equation and Multilevel Modeling Approaches 1st Edition Kevin J. Grimm PDF All Chapters
100% (6)
Download Full Growth Modeling Structural Equation and Multilevel Modeling Approaches 1st Edition Kevin J. Grimm PDF All Chapters
81 pages
Problem Set 4
No ratings yet
Problem Set 4
7 pages
6th Grade CBE
0% (1)
6th Grade CBE
3 pages
Stat 475 Notes 8: y B X y B X y BX N SEB NNX N X Is Unknown, Then We Substitute The Sample Mean X For It
No ratings yet
Stat 475 Notes 8: y B X y B X y BX N SEB NNX N X Is Unknown, Then We Substitute The Sample Mean X For It
13 pages
STAT 366 - Sample Survey Theory and Methods II - Lecture 2
No ratings yet
STAT 366 - Sample Survey Theory and Methods II - Lecture 2
82 pages
Stratified
No ratings yet
Stratified
17 pages
Bab 4
No ratings yet
Bab 4
38 pages
STAT8101_L3_25
No ratings yet
STAT8101_L3_25
43 pages
Chapter 4 - 2010
No ratings yet
Chapter 4 - 2010
13 pages
51.3 Stratified Random Sampling
No ratings yet
51.3 Stratified Random Sampling
15 pages
Random Sampling (Stratified) Example
No ratings yet
Random Sampling (Stratified) Example
4 pages
Chapter 4 - Stratified Random Sampling-1
No ratings yet
Chapter 4 - Stratified Random Sampling-1
12 pages
STA 405 FIRST MATERIAL
No ratings yet
STA 405 FIRST MATERIAL
17 pages
stk4610 s3 Ex Sheet
No ratings yet
stk4610 s3 Ex Sheet
5 pages
Sqqs2083 Sampling Techniques Chapter 4: Stratified Sampling
No ratings yet
Sqqs2083 Sampling Techniques Chapter 4: Stratified Sampling
38 pages
Unit-3 by EasePDF
No ratings yet
Unit-3 by EasePDF
29 pages
Chapter4 Stratified Sampling
No ratings yet
Chapter4 Stratified Sampling
27 pages
Stratified Sampling Notes
No ratings yet
Stratified Sampling Notes
7 pages
Stratified Sampling
No ratings yet
Stratified Sampling
17 pages
Ca09 Pitblado Handout
No ratings yet
Ca09 Pitblado Handout
28 pages
Talk 4
No ratings yet
Talk 4
35 pages
Sampling CH-4
No ratings yet
Sampling CH-4
16 pages
Intro To Statistics
No ratings yet
Intro To Statistics
9 pages
Sampling and estimmation notes
No ratings yet
Sampling and estimmation notes
5 pages
Sampling Methods: - Attaullah Shah
No ratings yet
Sampling Methods: - Attaullah Shah
16 pages
Stratified Sampling 2012
No ratings yet
Stratified Sampling 2012
17 pages
Lecture 5 Stratified Sampling
No ratings yet
Lecture 5 Stratified Sampling
14 pages
04 Stratified Sampling
No ratings yet
04 Stratified Sampling
19 pages
Sample Notes
No ratings yet
Sample Notes
9 pages
Design and Analysis of Surveys: Summer 2021
No ratings yet
Design and Analysis of Surveys: Summer 2021
27 pages
Session1_QTII_24
No ratings yet
Session1_QTII_24
31 pages
Chapter Four
No ratings yet
Chapter Four
77 pages
Lecture 4
No ratings yet
Lecture 4
55 pages
Stratified Sampling
No ratings yet
Stratified Sampling
11 pages
Stratified Sampling
No ratings yet
Stratified Sampling
4 pages
Biostatistics Laboratory SAMPLING
No ratings yet
Biostatistics Laboratory SAMPLING
6 pages
003. Stratified Sampling - Wikipedia
No ratings yet
003. Stratified Sampling - Wikipedia
4 pages
Principles of Sampling
No ratings yet
Principles of Sampling
20 pages
Advantages : Simple Random Sampling Systematic Sampling
No ratings yet
Advantages : Simple Random Sampling Systematic Sampling
2 pages
8 - M2 - Stratified Sampling
No ratings yet
8 - M2 - Stratified Sampling
33 pages
Sampling Error: in Statistics, Sampling Error Is Incurred When The Statistical Characteristics of
No ratings yet
Sampling Error: in Statistics, Sampling Error Is Incurred When The Statistical Characteristics of
15 pages
STAT354 Study Guide
No ratings yet
STAT354 Study Guide
54 pages
SAMPLING METHODS Group 2
No ratings yet
SAMPLING METHODS Group 2
3 pages
Sp Sampling Lect 12
No ratings yet
Sp Sampling Lect 12
19 pages
Sampling
No ratings yet
Sampling
15 pages
Statistics c.1
No ratings yet
Statistics c.1
125 pages
Introduction To Sampling: Situo Liu Spry, Inc. 10/25/2013
No ratings yet
Introduction To Sampling: Situo Liu Spry, Inc. 10/25/2013
22 pages
REVSTAT - v19 n2 06
No ratings yet
REVSTAT - v19 n2 06
16 pages
slidesc53_1_2 statistics
No ratings yet
slidesc53_1_2 statistics
27 pages
STA 2402 Design and Analysis of Sample Surveys PDF
No ratings yet
STA 2402 Design and Analysis of Sample Surveys PDF
81 pages
Sampling design
No ratings yet
Sampling design
48 pages
Sampling Techniques, PPT
No ratings yet
Sampling Techniques, PPT
26 pages
Stratified Sample
No ratings yet
Stratified Sample
4 pages
9 Sample Design
No ratings yet
9 Sample Design
42 pages
With The Collection, Presentation, Analysis and Interpretation of A Set of Data in Order To Yield Meaningful Information
No ratings yet
With The Collection, Presentation, Analysis and Interpretation of A Set of Data in Order To Yield Meaningful Information
9 pages
Sampling Fundamentals Modified
No ratings yet
Sampling Fundamentals Modified
45 pages
Sampling Methods
No ratings yet
Sampling Methods
40 pages
Chap016 - Sao Chép
No ratings yet
Chap016 - Sao Chép
30 pages
LR 2 Sampling
No ratings yet
LR 2 Sampling
27 pages
Formalizing The Concepts: Simple Random Sampling: Juan Muñoz Kristen Himelein March 2013
No ratings yet
Formalizing The Concepts: Simple Random Sampling: Juan Muñoz Kristen Himelein March 2013
25 pages
Sampling
No ratings yet
Sampling
50 pages
Unit 3
No ratings yet
Unit 3
16 pages
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Grade 8 02-18-20
No ratings yet
Grade 8 02-18-20
2 pages
CH 1 Everyone Is A Philosopher
No ratings yet
CH 1 Everyone Is A Philosopher
30 pages
ASQ Certkey CSSBB v2019!02!24 by Charlotte 125q
No ratings yet
ASQ Certkey CSSBB v2019!02!24 by Charlotte 125q
63 pages
Biodiversity R
No ratings yet
Biodiversity R
85 pages
11014-Article Text-33351-2-10-20230201
No ratings yet
11014-Article Text-33351-2-10-20230201
15 pages
Course Syllabus - Spring Semester 2022/2023 DS 14350 Computer Architecture For Machine Learning
No ratings yet
Course Syllabus - Spring Semester 2022/2023 DS 14350 Computer Architecture For Machine Learning
3 pages
II B.tech I Sem Resulsts Nove 2016
No ratings yet
II B.tech I Sem Resulsts Nove 2016
215 pages
Chapter 3 FACTORS AFFECTING THE ACADEMIC PERFORMANCE OF SENI
100% (1)
Chapter 3 FACTORS AFFECTING THE ACADEMIC PERFORMANCE OF SENI
7 pages
The Role of The Assessor and Assessment Center
100% (1)
The Role of The Assessor and Assessment Center
25 pages
Set 1 QP Business Statistics
No ratings yet
Set 1 QP Business Statistics
2 pages
MSC Nursing Syllabus
100% (2)
MSC Nursing Syllabus
196 pages
Ang Et Al (2000) Summary
No ratings yet
Ang Et Al (2000) Summary
3 pages
Solved problems_survival
No ratings yet
Solved problems_survival
8 pages
Statistics For Economics Module Teaching
No ratings yet
Statistics For Economics Module Teaching
175 pages
STAT Computation
No ratings yet
STAT Computation
9 pages
DRC FX 0 Afc4
No ratings yet
DRC FX 0 Afc4
116 pages
Analysis of Pond Aquaculture in The Northern Malawi Application of Stochastic Frontier Analysis
No ratings yet
Analysis of Pond Aquaculture in The Northern Malawi Application of Stochastic Frontier Analysis
19 pages
Chap 007
No ratings yet
Chap 007
15 pages
Patients' Rights: Patients' and Nurses' Perspectives
No ratings yet
Patients' Rights: Patients' and Nurses' Perspectives
7 pages
SES Sirin Marc
No ratings yet
SES Sirin Marc
37 pages
Prediction of Permeability and Porosity From Well Log Data Using The Nonparametric Regression With Multivariate Analysis and Neural Network, Hassi R'Mel Field, Algeria
No ratings yet
Prediction of Permeability and Porosity From Well Log Data Using The Nonparametric Regression With Multivariate Analysis and Neural Network, Hassi R'Mel Field, Algeria
16 pages
An Introduction To Bayesian Statistics and MCMC Methods
No ratings yet
An Introduction To Bayesian Statistics and MCMC Methods
69 pages
Pengaruh Dongeng Terhadap Perubahan Gangguan Tidur Anak Usia Prasekolah Akibat Hospitalisasi Di Rumah Sakit
No ratings yet
Pengaruh Dongeng Terhadap Perubahan Gangguan Tidur Anak Usia Prasekolah Akibat Hospitalisasi Di Rumah Sakit
7 pages
Azospirillum Brasilense
No ratings yet
Azospirillum Brasilense
9 pages
Chapter 3 Without Audio
No ratings yet
Chapter 3 Without Audio
52 pages
MBS3-TB09
No ratings yet
MBS3-TB09
24 pages

slidesc53_2

Uploaded by

slidesc53_2

Uploaded by

STAC53

str.mu.est <- function(N_h,y,details="no", conf.level) {

• Ex: Do these calculations by hand.

• Result: Under stratified sampling with proportional allocation,

You might also like