0% found this document useful (0 votes)
12 views

STAT354 Study Guide

STAT354StudyGuide

Uploaded by

nilsdmikkelsen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

STAT354 Study Guide

STAT354StudyGuide

Uploaded by

nilsdmikkelsen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

STAT 354 Study Guide

Nils DM
April 14, 2021

1 Introduction
1. Sampling: The selection of a subset (a statistical sample) of individ-
uals from within a statistical population to estimate characteristics of
the whole population.

2. Statistic: A simple statistic is any quantity computed from values in


a sample that is used for a statistical purpose. A sample statistic is a
function of observations in the sample. The value of a statistic varies
from sample to sample.

3. Parameter: A population parameter is a certain numerical character-


istic of a population. A parameter is a constant, not a random variable.

4. Inference: A conclusion reached on the basis of evidence and reason-


ing.

5. Statistical Inference: The process of drawing conclusions about a


population on the basis of information contained in a sample.

1
6. Sample Collection Methods: There are three ways in which samples
are obtained:

(a) Controlled Experiments: the type of experiments common in


science where experimenters have control on which subjects get
which treatments at what levels.
(b) Sample Surveys: Investigation where a researcher draws a sam-
ple from a defined population. The researcher then observes vari-
ous characteristics of units drawn and on the basis of these obser-
vations in first things about the population.
(c) Happenstance Data: Data drawn not from a prescribed sam-
pling scheme, but rather when it is the only data available for
use.

2
2 Elements of the Sampling Problem
1. Element: An element on which a measurement is taken.
Ex. A voter being polled

2. Population: A collection of elements about which we wish to make an


inference. The population should be carefully and completely defined
before collecting a sample.
Ex. All registered voters in an election

3. Sampling Units: Nonoverlapping collections of elements from the


population that cover the entire population.
Ex. individual voters or households in an election

4. Frame: A list of sampling units


Ex. A list of all registered voters or the list of all households

5. Sample: A collection of sampling units drawn from a frame or frames.


Ex. A sample of voters from the population of all voters

6. The Objective of Sampling Surveys: To estimate an unknown


population parameter θ
Ex. mean, total, proportion, etc.

7. Sampling Symbols/Terminology: some common symbols are:

(a) θ:
b An estimated value based on the sample
(b) B: A desired error bound |θb − θ| < B
(c) 1 − α: A desired probability that

P (|θb − θ|) < B = 1 − α

Where B = 2σθb
The objective is to have 1 − α be as close to 1 as possible. Usually
we set 1 − α = 0.95

8. The goal of Sampling Strategies is meeting the requirement of


P (|θb − θ| < B) = 1 − α at the minimum cost.

9. P (|θb − θ| < B) = 1 − α is a probabilistic assessment of error.

3
10. Probability Sampling Design: As a sampling design based on some
planned randomness. The main probability sampling designs are:

(a) Simple Random Sampling: Select n sampling units from a


total of N . Each of the n units have the same probability N1 of
being selected.
(b) Stratified Random Sampling: Dividing the population in two
or more groups (strata), such as male/female, and selecting a sim-
ple random sample from each group.
This is used when the group difference is small (with respect to
the question of interest) in between groups, the difference is large.
(c) Cluster Sampling: When we divide the population into many
clusters, usually based on some distance notions such as county,
city or blocks. Take a simple random sample of clusters first,
then sample items within the selected clusters. A cluster may be
viewed as a strata (b) but cluster sampling is different since not
all clusters are sampled.
(d) Systematic Sampling: Sampling according to some pattern,
such as taking the 5th, 10th, 15th etc. person on a list of voters.
This is convenient that may not be accurate.

11. Quota Sampling: A method for selecting survey participants that is


a non-probabilistic version of stratified sampling. Such as selecting a
group of 200 males and 300 females from a population.

12. Source of Error in Surveys: There are two types of survey errors:

(a) Errors of Non-observation:


i. Sampling Error: Errors due to imperfect representation of
the population by a sample. This is inevitable unless:
sample = population
ii. Errors of Coverage: The sampling frame does not match
with the target population. Such as using only people who
have phone numbers listed in the phone book.
iii. Nonresponse: Inability to contact, answer or refusal to an-
swer (most serious).
(b) Errors of Observation: Errors incurred while taking measure-
ments. The source of such errors are:
i. The Interviewer: The way questions are asked.

4
ii. The Respondent: Motivation to answer correctly/inability
to answer correctly.
iii. The Measurement Instruments: Issues or limitations such
as inaccurate or mis-calibrated instruments.

13. Methods of Data Collection:

(a) Personal interviews


(b) Telephone interviews
(c) Self-administered questionnaires
(d) Direct observation

14. Reducing Errors in Surveys:

(a) Callbacks
(b) Rewards and incentives
(c) Trained interviewers/quality instruments
(d) Data checks
(e) Questionnaire construction

15. Designing a Questionnaire:

(a) Question ordering (non-personal questions first)


(b) Open versus closed questions
(c) Response options
(d) Wording options

16. Planning a Survey Checklist:

(a) Statement of objective (think through)


(b) Define the target population
(c) Identify the frame so that the sampling population is close to the
target population
(d) Sampling design
(e) Method of measurement
(f) Measurement instruments
(g) Selection and training of field-workers.
(h) Pretest with a small sample before the large scale implementation

5
(i) Organization of fieldwork
(j) Organization of data management (at each stage)
(k) Data analysis, such as
i. boxplots
ii. histograms
iii. QQ plots

6
3 Basic Concepts of Statistics
1. Histograms: Visualize the shape of the data

2. Boxplots: A box shape along with lines on either side where:

(a) Q1 : the first quantile, 25% of data below this point (the bottom
edge of the box).
(b) Q2 : the median. 50% below this point (the middle line inside the
box).
(c) Q3 : the third quantile, 75% of data below this point (the top edge
of the box).
(d) IQR = Q3 − Q1: the lines coming out of the box each represent
1.5 · IQR

3. Info on Populations and Samples (Infinite Case): We can sum-


marize P∞ if and only if we know the population composition through
a frequency table or density function.
Notation:

(a) P∞ : an infinite population


(b) PN : a finite population with size N
(c) N : countably infinite population size
(d) n: sample size
(e) Mean/Expectation:

X ∞
X
µ = E(Y ) = yp(y) = yj p(yj )
y j=1

(f) Variance:

X ∞
X
2 2 2
σ = V (Y ) = E(y − µ) = (y − µ) p(y) = (yj − µ)2 p(yj )
y j=1


(g) Standard Deviation: σ = σ2

7
(h) Estimating parameters through random sampling: When
P (yj ) is not available, the exact values of parameters are not avail-
able but can be estimated through a random sample from P∞
n
1X
µ
b=y= yi
n 1

n
1 X
σb2 = s2 = (yj − y)2
n − 1 j=1

s = s2

4. Info on Populations and Samples (Finite Case): When N < +∞.


The notation for PN : u1 , u2 , . . . , uN (this is not a sample) is:

(a) Total:
N
X
τ = u1 + u2 + . . . + uN = ui
1

(b) Mean:
u1 + u2 + . . . + un τ
µ= =
N N
(c) Variance:
w
2 1 X
σ = (uj − µ)2
W j=1

(d) Standard Deviation: σ = σ2

5. Selecting from PN with Replacement:


Let:
PN : u1 , u2 , . . . , uN
with selection probability for each ui :

δ1 , δ2 , . . . , δN

where:

(a) δj ≥ 0
P
(b) δj = 1

8
Then for N = 4 and n = 2, there are:
   
4 4
+ = 10
2 1
distinct samples where 42 denotes the combinations where u1 6= u2 ,


then we obtain the following probabilities and means for the following
samples:
Sample Prob Mean
ui , ui δi2 ui
ui +uj
ui , uj 2δi δj 2

Where the 2 is the result of the two possible different orderings for the
same sample when i 6= j.
6. Estimating τ :
Let:
PN : u1 , u2 , . . . , uN
with selection probability for each ui :
δ1 , δ2 , . . . , δN
And let a sample from PN be denoted as:
{y1 , y2 , . . . , yn }
with corresponding probabilities:
δ1 , δ2 , . . . , δi
We can estimate τ as: n
1 X yi
τb =
n i=1 δi
The preferred derivation is:
1
δi = ,
N
n
1 X yi
τb =
n i=1 N1
n
1X
=N· yi
n i=1
τ
=N ·y =N ·
N
= E(b
τ) = τ ∴ τb is unbiased

9
Ex.
If we have P3 = {1, 2, 3}
with τ = 1 + 2 + 3 = 6
and δ1 = 0.2, δ2 = 0.3, δ3 = 0.5
for n = 2,
we can estimate τ with

Sample Prob τb
1 yi
ui , ui δi2 (
n δi
+ yδii )
1 yi y
ui , uj 2δi δj (
n δi
+ δjj )

r
X
E(b
τ) = pi τbi
i=1
r
X
V (b
τ) = (pi τbi2 ) − E(b
τ )2
i=1

where r denotes the number of samples of size n that we take.


7. Selecting from PN without replacement: While nice in theory,
sampling with replacement may be inefficient and unrealistic.
Terms:
(a) πi : The probability that the ith element of PN , ui , is selected in
the sample.
(b) We can estimate τ as:
n n
X yi X
τb = = wi yi
i=1
π i i=1

Where: wi = π1i
Ex.
If we have P3 = {1, 2, 3}
with τ = 1 + 2 + 3 = 6
and δ1 = δ2 = δ3 = 13
for n = 2,
πi = 32 (the number of n = 2 samples each ui is in)
Where the probability for each distinct sample (or order does not
matter) is p = 31

10
We then have the following table:
Sample Prob τb
1 u1 u2 1 2
(1, 2) 3 π1
+ π2
= 2 + 2
3 3
1 1 3
(1, 3) 3 2 + 2
3 3
1 2 3
(2, 3) 3 2 + 2
3 3

n
X
E(b
τ) = τbi pi
i=1
n
X
V (b
τ) = τ )2
(τi2 pi ) − E(b
i=1

8. In the above example the individual elements all had δi = 31 . But we


can sometimes reduce the variance by having unequal δi ’s.
Ex.
Let δ1 = 0.2, δ2 = 0.3, δ3 = 0.5,
n=2
We can calculate the probabilities as follows:
δj δi
P (ui , uj ) = δi · + δj ·
¬δi ¬δj

For the above δi values:


0.5 0.2
P (1, 3) = 0.2 · + 0.5 × = 0.325
0.8 0.5
We can then calculate the πi ’s by summing all p(ui , uj )’s that contain
the i of interest:

π3 = p(1, 3) + p(1, 3) = 0.325 + 0.5143 = 0.8393

τb, E(b
τ ), V (b
τ ) are all calculated the same way as above. The variance
in this instance is lower than when di = 13 for all i.
The variance of an estimator is lowered if δj is proportional to the
size of yj (δj increases as the value for each element in the sample yj
increases).

11
Sampling Distributions
Let θb denote the estimator of a population parameter.
1. Properties of θ:
b

(a) E(θ)
b = θ (this means θb is unbiased)
(b) V (θ)
b is small

More generally we define:


(a) bias(θ) b −θ
b = E(θ)
(b) MSE(θ)b = E(θb − θ)2 = bias2 (θ)
b + V (θ)
b

We want both the bias and MSE to be small and also for:
[
θb ∼ N (θ, V (θ))
b

This allows us to find a confidence interval for θ as:

(θb − Z α2 s.e.(θ),
b θb + Z α s.e.(θ))
2
b

Where: q
[
s.e.(θ)
b = V (θ)
b
A 95% CI ⇔ (1 − α) = 0.95 ⇔ α = 0.05 ⇔= Z α2 = Z0.025 = 1.96
Remember:
|θb − θ| ≤ B = 2s.e.(θ)
b
Where B is the bound on the error.
2. All three desirable properties for θb depend on its distribution. Since
θb as a function of the sample, we call its distribution the sampling
description of θb which varies with sample size n
3. For a random sample with equal probability selection and with re-
placement an estimator for µ is:
n
1X
x= yi
n i=1

Where:
(a) E(x) = µ
σ2
(b) V (x) = n

12
4. For a random sample with equal probability selection and without
replacement an estimator for µ is:
n
1X
x= yi
n i=1

Where:

(a) E(x) = µ
σ2 N −n
(b) V (x) = n
· N −1
, σ 2 unknown

The modified variance of x is a finite population correction factor

5. Covariance: A measure of the joint probability of two random vari-


ables:

cov(y1 , y2 ) = E((y1 − µ1 )(y2 − µ2 ))


= E(y1 y2 ) − µ1 µ2

Where:

(a) cov(y1 , y2 ) ⇔ y1 and y2 are uncorrelated.


(b) if y1 ⊥y2 (independent) ⇒ cov(y1 , y2 ) = 0 since:

E(y1 y2 ) = µ1 µ2

Important: cov(y1 , y2 ) = 0 does not imply independence.

6. Correlation Coefficient: A scaled version of covariance where |ρ| ≤ 1


and is defined as:
cov(y1 , y2 )
ρ = corr(y1 , y2 ) =
σ1 · σ2
Where:

(a) ρ = 0 ⇔ no correlation
(b) |ρ| = 1 a perfect linear relationship between y1 = y2 :

y2 = a + by1

13
4 Chapter 4: Simple Random Sampling
1. Simple Random Sampling (SRS): For a finite population:

PN = u1 , u2 , . . . , un

Where:  
N
n
Denotes the total number of all distinct samples of size n.
A sampling procedure which draws a sample of size n from PN in such
a way that all Nn samples of size n have equal chance of being drawn


is called simple random sampling.


Simple random sampling removes unconscious investigator bias.

2. Simple Random Sample: A sample obtained through SRS.

3. Population Estimators using a SRS: Suppose we have an simple


random sample:
SRS = y1 , y2 , . . . , yn
With finite population:

PN = u1 , u2 , . . . , un

Population Mean is estimated by:


n
1X
µ
b=y= yi
n i=1

Population Total is estimated by:

τb = N µ
b = Ny

Biased Population Variance is estimated by:


n
1 X
s2 = (yi − y)2
n − 1 i=1

Unbiased Population Variance is estimated by:


N N −1 2
E(s2 ) = σ 2 ⇒ σb2 = s
N −1 N

14
Variance of Population Mean is estimated by:
s2 N − n
Vb (y) = ( )
n N
Variance of Population Total is estimated by:
2
s N −n
τ ) = N 2 Vb (b
Vb (b µ) = N 2 ( )( )
n N
Bound on Error of Estimation is estimated by:
r
s2 N − n
q
B = 2 Vb (y) = 2 · ( )
n N
4. Finite Population Correction Factor: If N >> n, then:
N −n
≈1
N
The finite population correction factor is used when you sample without
replacement from more than 5% of a finite population. It’s needed
because under the circumstances, the CLT doesn’t hold of the standard
error of the estimate will be too big.
5. Sample Size Determination: Suppose we want to estimate µ within
a desired error bound B, where:
r
s2 N − n
q
B = 2 Vb (y) = 2 · ( )
n N
How do we achieve this bound?
If we solve for n, we obtain:
N σ2
n= 2
(N − 1) B4 + σ 2
But since σ 2 is unknown, in practice we use:
N s2
n= 2
(N − 1) B4 + s2

6. When the population range is available:


Range
b≈
σ
4
Therefore:  Range 2
2
b ≈
σ
4
15
7. Sample Size required to estimate τ : To estimate τ within a desired
bound on error B:
N σ2
n=
(N − 1)D + σ 2
Where:
B2
D=
4N 2
8. Estimation of a Population Proportion: Consider a finite popu-
lation consisting of 0’s and 1’s:

PN : u1 , u2 , . . . , un

Where: (
1 jth member has a specific trait
uj =
0 otherwise
Such a population PN is fully characterized by a single parameter:

p = the proportion of 1’s in PN

Where: n
X uj
p=
j=1
N

And where:
q =1−p
The proportion of 0’s

9. The mean of PN is:

E(uj ) = p, 0 ≤ p ≤ 1

10. The variance of PN is:

V (uj ) = p(1 − p)

16
11. Sample Proportion Estimation from SRS: Consider a finite pop-
ulation consisting of 0’s and 1’s:

PN : y1 , y2 , . . . , yn

Where: (
1 jth member has a specific trait
yj =
0 otherwise
The sample proportion is calculated by:
n
1X
pb = yi = y
n i=1

Where 0 ≤ pb ≤ 1.

12. Estimating unknown p with pb:

E(bp) = E(y) = E(yi ) = p (unbiased)


σ2 N − n 2
p) = V (y) =
V (b · , σ unknown
n N −1

13. Unbiased Estimation of V (b


p):

pbqb N − n
Vb (b
p) = ( )
n−1 N
Where q = 1 − p

14. The sample size required to estimate p with bound of error B is


calculated by:
N σ2
n= 2
(N − 1) B4 + σ 2
In general σ 2 is not known, so we use:
N pq
n= 2
(N − 1) B4 + pq

When p is not known, set p = q = 0.5 to calculate n.

17
15. Two Independent Samples: Consider:

PNx :µx , σx2 ; SRS: x1 , x2 , . . . , xm → x, s2x


PNy :µy , σy2 ; SRS: y1 , y2 , . . . , yn → y, s2y

Two independent samples with the following properties:

E(y − x) = µy − µx (unbiased)
V (y − x) = V (y) + V (x) (Independent)
s2 Ny − m
Vb (x) = x ( )
m Ny
s2y Ny − n
Vb (y) = ( )
n Ny
q
B = 2 · Vb (y) + Vb (x)

16. Population Proportions for Multinomial Sampling: Consider:

PN : p1 , p2 , . . . , pk (population consisting of k groups)


k
X
SRS : x1 , x2 , . . . xn ⇒ pb1 , pb2 , . . . , pbk ; pbj = 1
i=1

With the following properties:

E(pb1 − pb2 ) = p1 − p2
V (pb1 − pb2 ) = V (pb1 ) + V (pb2 ) − 2cov(pb1 , pb2 )
p1 (1 − p1 ) p2 (1 − p2 ) p1 p2
= + +2
n n n
Where:
pi pj
cov(pbi , pbj ) = −
n

18
5 Chapter 5: Stratified Random Sampling
1. Stratified Random Sampling: A type of sampling where we:

(a) Divide the population into L non-overlapping groups or strata.


(b) Select a SRS from each stratum.
L
X
Nj = N
j=1

The resulting combined sample is called a stratified random sample.

2. Advantages of stratified random sampling over SRS:

(a) When the values of the variable of interest are more homogeneous
within stratum than across the population. Better position of
estimation can be gained.
(b) When there is an administrative and/or cost advantage
(c) Stratified sampling allows us to estimate parameters for individual
strata, therefore, more information may be gained.

3. Examples of strata:

(a) Using provinces as strata when estimating voter proportions in


federal elections.
(b) A chain of department stores using individual stores as strata
(c) Dividing university students by year
(d) Subdividing farmers by size (below 50 acres, 50-100 acres, etc.)
when estimating parameters about beef cattle production in Al-
berta.

Good specification of strata is the key to realizing the advantages of


stratified random sampling

4. Estimation of population mean and total: Recall that for a finite


population PN :
L
X
Strata size: N1 , N2 , . . . NL = Nj = N
j=1

19
With,
L
X
SRS size: n1 , n2 , . . . , nL = nj = n
j=1

We assuming N, N1 , . . . , NL is assumed to be known.


We define strata weights as:
Nj
wj =
N
and we observe that:
L L L
X X Nj 1 X N
wj = = Nj = =1
j=1 j=1
N N j=1 N

The strata totals are denoted as τ1 , τ2 , . . . , τL


Where the population total is determined by:
L
X
τ= τj
j=1

The strata means are denoted: µ1 , µ2 , . . . , µl , where:


τj
µj =
Nj

The population mean are determined by:


τ
µ=
N
L
X τj
=
j=1
N
L
X Nj τj
=
j=1
N Nj
L
X
= w j µj
j=1

∴ the population mean µ is the weighted sum of strata means.

20
5. Estimating µ and τ : With stratified random sampling having strata
sample sizes n1 , . . . , nL ,
let y1 , y2 , . . . , yL be the corresponding strata sample means,
Then for the ith stratum:

E(yi ) = µi
σ 2 Ni − ni
V (yi ) = i ( )
ni Ni − 1
s2 Ni − ni
Vb (yi ) = i ( )
ni Ni

Where σi2 is the ith stratum variance, and s2i is the ith sample variance.
Strata samples are independent ⇒ yi ⊥yj ⇒ cov(yi , yj ) = 0

6. Estimators for µ and τ :

τbi = τbst = Nj yj
L
X
Vb (b
τst ) = Nj2 V (yj )
i=1
L
X Nj − nj s2j
= Nj2 ( )
j=1
Nj nj
1 1
µ
bst = y st = τbst = [N1 y 1 + . . . + NL y L ]
N N
1
Vb (b
µst ) = Vb (b
yst ) = 2 Vb (b
τst )
N
L
1 X 2 Ni − ni s2i
= 2 N ( )( )
N i=1 i Ni ni

7. Approximate 95% Confidence Intervals:


q
µ = y st ± 2 Vb (b
µst )
q
τ = τbst ± 2 Vb (b
τst )
The above holds for any chooses of n1 , n2 , . . . , nL

21
8. Sample Size Determination:We will use the following notation:
(a) n1 , n2 , . . . nL : sample sizes for individual strata
PL
(b) n = nj : the total sample size
j=1
n
(c) aj = nj : the allocation fraction for the j th strata
We consider two sample size determination problems:
(a) For a given bound on error B and allocation fractions a1 , a2 , . . . , aL ,
we want to find the sample size n to achieve the specified B
(b) For a given sample size n given, we want to determine the ”opti-
mal” allocation fractions a1 , a2 , . . . , aL
Since,
L L
B2 B2 1 X Nj2 s2j 1 X
q
B = 2 V (y st ) ⇒
b = V (y st ) ⇒
b = 2 − 2 Nj s2j
4 4 N j=1 naj N j=1

This implies:
L
P Nj2 s2j
aj
j=1
n= L
Nj s2j )
P
(N 2 D +
j=1
B2
Where: D = 4
Since we normally don’t know s2j ahead of time, we can estimate it in
the following ways:
(a) From a pilot sample (costly and not always practical)
(b) By calculating the range of the j th strata
(c) By analysing existing data/previous studies
q
9. The n needed to achieve B = 2 Vb (b τst )can be derived as follows:
To achieve a desired bound on error B under pre-determined allocation
fractions a1 , a2 , . . . , aL caps the sample size n needed is given by the
n = . . . above where:
B2
D= , for estimating µ
4
B2
D= , for estimating τ
4N 2

22
10. Determining the Optimal Allocation Fractions aj , . . . , aL :
Suppose n is given and (Nj , σj2 ) are known for j = 1, . . . , L. The total
cost of sampling n units is:
L
X
c= cj n j
j=1

Where cj is the cost per unit sampled from the j th strata.


Our goal is to minimize V (b
µ) subject to the cost function c.
By using Legrange multipliers, we find that the optimal nj is calculated
by: r
Nj
" Nj −1 #
Nj σj √cj
nj = n r
Ni
L
P Ni −1
Ni σi ci

i=1

If we assume that all strata sizes Nj are large and hence:

Ni
≈1
Ni − 1
Then we have the optimal allocation fraction:
N j σj

nj cj
aj = = N NL σL
n √1 σ1 + N
√2 σ2 + ... + √
c1 c2 cL

This optimal allocation gives greater weight to a stratum if:

(a) The stratum is large (large Nj )


(b) The strata is more variable internally (large σj )
(c) It’s cheaper to sample from the stratum (smaller cj )

23
Special Cases: Optimal allocation coincides with two well-known spe-
cial allocations:

(a) Neyman Allocation:


Equal cost: c1 = c2 = . . . = cL
nj Nj σj
aj = =
n N1 σ1 + . . . + NL σL
From the above formula we can derive the equation for calculating
nj . Under Neyman allocation, the equation for the total sample
size n becomes: P L 2
Nk σk
k=1
n= L
ni σi2
P
N 2D +
i=1

(b) Proportional Allocation:


Equal cost: c1 = c2 = . . . = cL
Equal Variance: σ12 = σ22 = . . . = σL2

nj Nj
aj = = = wj
n N1 + . . . + NL

Under Neyman allocation, the equation for the total sample size n
becomes:
L
Ni σi2
P
i=1
n= L
N D + N1 Ni σi2
P
i=1

11. Neyman Allocation vs. Proportion Allocation vs. SRS:

Note that optimal allocation is not included here since it is clearly the
best by definition

If Nj  nj for j = 1, 2, . . . , L so that F.P.C. (Finite population Cor-


rection Factor) can be ignored, then for a fixed n we have:

V (neyman) ≤ V (prop) ≤ V (SRS)

24
The three are usually quite close, however we see a big difference when
σ12 , σ22 , . . . , σL2
V (neyman) ≤ V (prop) is always true.
With effectively defined strata, stratified sampling provides more
accurate estimates than SRS
When is this untrue?
If nj is a considerable fraction of Nj and hence FPC matters, then the
above statement may be false.
12. Setting up Strata: The efficiency of stratified estimators depends on
the choice of strata. How do we divide a PN into strata effective for
stratified sampling?
(a) Case 1: Strata present themselves.
Estimating average household income for Victoria, the obvious
stratum would be neighbourhoods.
(b) Case 2: Auxiliary (numerical) variable available.
We can use the cumulative square root method for finding ≤ 6
strata.
The idea:
i. Split data and a starting stratum
ii. Sleep the frequency, square root of the frequency and cumu-
lative square root frequency for each starting stratum.
iii. If we want L = x stratum, divide the cumulative square root
frequency by x, and set each partition as tx for t = 1, ..., L − 1
iv. The partitions calculated above are the new strata
Strata defined by this method is approximately optimal for each
fixed L.
For example see p. 12 of Week 5-6
13. Estimation of Population Proportion: Recall that the population
proportion p is just the mean of:
PN : u1 , u2 , . . . , un
Where: (
1, the j th individual has the trait
uj =
0, otherwise
So we can use methods developed for estimating a mean µ for p. Here:

25
(a) µ = p overall population mean/proportion
(b) µj = pj j th stratum mean/proportion
(c) σi2 = pi qi j th stratum variance
Nj
(d) for N, Ni population size and j th stratum size wj = N

The estimates based on SRS for each stratum:


(a) µj = yj = pbj : point estimate for µj = pj
(b) σbj 2 = pbj qbj : point estimate for σj2
L
P
(c) pbst = µ
b = ybst = wj pbj : E(b
pst ) = p
j=1

(d) V (b
ps t):
L
X pi qi Ni − ni
= wi2 ( )
i=1
ni Ni − 1

(e) Vb (b
pst ):
L L
X s2 Ni − ni X pbi qbi Ni − ni
= wi2 i ( ) = wi2 ( )
i=1
ni Ni − 1 i=1
ni − 1 Ni − 1
ni pbi qbi
Since s2i = ni −1

14. Sample Size Determination: Suppose allocation fractions a1 , . . . , aL


are given. What sample size n is needed to achieve abound on error
B?
L
Nj2 pbj qbj /aj
P
j=1
n= L
P
N 2D + Nj pbj qbj
j=1

15. Optimal Allocation Fraction: For fixed cost, the optimal allocation
is: q
pbj qbj
h Nj cj
i
nj = n q q
N1 p1c1qb1 + . . . + NL pbLcLqbL
d

16. Post Stratification: We may stratify a SRS from PN by dividing


a SRS into samples from different strata. This is called stratification
after selection of the sample or post stratification.
We may do this when:

26
(a) We are unable to collect stratified samples directly:
Ex. We want a sample of 50% male voters and 50% female voters:
i. Take a SRS telephone survey of randomly chosen households
(not stratified samples)
ii. After the SRS is taken, divide it into male/female voter sam-
ples and view them as SRS from the two strata respectively
(b) We need to correct and unbalanced SRS:
To do this, we calculate a weighted average such as:
N1 N2
y st = y1 + y2
N N

17. Double Sampling for Stratification (Two Phase Sampling):


So far we have assumed strata size N1 , . . . , NL are known. When they
are not, follow the steps below:

(a) Phase 1: Take a large SRS of size n0 from PN to estimate:

Ni
wi =
N
Say, n0 = n01 + . . . + n0L , then:

n0j
w
cj =
n0

(b) Phase 2: Take a stratified sample of n = n1 + . . . + nL where


(n  n0 ). Then we estimate µ or τ :
L
X
y 0st = w
cj yj
j=1

L hc
X wj2 s2j cj (yj − y 0 st )2 i
w
Vb (y 0st ) = +
j=1
nj n0

27
6 Chapter 6: Ratio, Regression and Differ-
ence Estimation
The last two chapters were concerned with the design of sample survey
(SRS vs Stratified) where we exclusively used sample means to estimate µ.
We now consider methods of estimation that make use of
auxiliary/subsidiary variables. These can be combined with both SRS and
stratified sampling
1. The need for an auxiliary variable, such as a ratio estimator arises
when either N is unavailable or too time-consuming to determine.
In the case where we have oranges with sugar content y1 , y2 , . . . yn
The average a sugar content is determined by:
n
1X
y= yi = µ
by
n i=1
And therefore:
by = N y
τby = N µ
Is the total sugar content for the data.
Since N is not known, we may exploit the fact that xi , . . . xn repre-
senting the weights of an orange may be much easier to calculate the
number of oranges.
We then observe that:
µy Nµ by τy
= =
µx Nµ bx τx
Where τx the total weight of all oranges.
Therefore:
µy Nµ
by τy
τy = · τx = · τx = · τx
µx Nµ
bx τx
2. Calculating τy :

(a) Take an SRS and record both weight and sugar content.
(b) Use your SRS to then calculate µ
by = y and µ
bx = x.
(c) Assuming τx is known, calculate τby as:
y
· τx
x
This is an example of a ratio estimator based on the auxiliary
variable weight.

28
3. Ratio Estimator Notation:
We define:
µy τy
R= =
µx τx
And denote:
y
r=
x
to be the estimator of R. Then we define:

(a) τby = rτx to be the ratio estimator of τy


(b) µ
by = rµx to be the ratio estimator of µy

Where either τx or µx is known (sometimes, just one of them).

τr ) = V (r)τx2
(a) V (b
µr ) = V (r)µ2x
(b) V (b

4. Ratio Estimation Using SRS:


Suppose we have:
PN : u1 , u2 , . . . , uN
Where uj = (xj , yj )(paired data)
And where:

(a) y: variable of interest


(b) x: auxiliary variable. (µx or τx is available)

Then:
N N
X yj X xi
µy = , µx =
i
N i
N
N
X N
X
τy = yj = N µ y τx = xj = N µ x
i i

Then we define:
µy τy
R= =
µx τx
As the population ratio.
Start by taking an SRS from PN {u1 , u2 , . . . , un }

29
Or:

y1 , y2 , . . . , yn ⇒ y
x1 , x2 , . . . , xn ⇒ x

Then we define a point estimator of R as:


n
P
yi
y i
r= =P
n
x
xi
i

Where:
y N −n 1 2
V (r) = V ( ) = ( )( 2 )Sr
x nN µx
Where: n
(yj − rxj )2
P
j
Sr2 =
n−1
5. Ratio Estimator of τy :
y
τby = rτx = τx
x
N −n 2
τy ) = N 2 (
Vb (b )Sr
nN
6. Ratio Estimator of µy :
y
µ
by = rµx = µx
x
N −n 2
Vb (b
µy ) = ( )Sr
nN
The ratio estimator for τy is better than the SRS estimator since it has
a smaller variance.
SRS is better when the parameters are nearly uncorrelated. Ratio
estimator is better when we have a strong positive correlation between
parameters.

7. The Ratio Estimator is biased:


µy
µy ) = µx E(r) and E(r) 6=
Since E(c µx

30
8. Sample Size Determination: We can estimate the following param-
eters with bound on error = B as follows:

(a) Estimating R :
p
B = 2 V (r)
s
N −n 1 2
B=2 ( )( 2 )σ
nN µx
Where:
N
1 X
2
σ = (yj − Rxj )2
N j=1

We then calculate n as:


N σ2
n=
N D + σ2
Where:
B2
D=
4
2
Since σ is a known, we estimate Sigma squared by taking a pre-
liminary sample (x1 , y1 ), ..., (xn0 , yn0 ) as:
n0
(yj − rxj )2
P
1
b2 =
σ
n0 − 1

(b) Estimating µy :
p
B = 2µx V (r)
N σ2
n = B2
N 4 + σ2
for µy

(c) Estimating τy :
p
B = 2τx V (r)
N σ2
n= B2
N 4N 2
2 + σ

for τy

31
9. Simulation Study of Ratio Estimators: We can simulate ratio
estimators as follows:

(a) Generate a PN : u1 . . . un with uj = (xj , yj )


(b) Compute R
(c) Generate m SRS for PN : r1 , r2 , . . . , rm
m
P
rj
(d) Compute r = 1
m
and bias = r − R

10. Ratio Estimation in Stratified Random Sampling: There are


two types of ratio estimation for stratified random sampling:

(a) Separate Ratio Estimator:


For strata 1, . . . , L we have for stratum i:
yi
µ
byi = ri µxi = µx
xi i

Let µ
byrs be the weighted µ
byi where rs is ”ratio separate”. Then:
L L
X X yj
µ
byrs = wj µbj = wj µx
j=1 j=1
xj j

L
X Nj − nj 2
Vb (b
µyrs ) = wj2 S
j=1
Nj − nj rj

Where: nj
(yjk − rj xjk )2
P
k=1
Srj =
(nj − 1)

32
(b) Combined Ratio Estimator:
Let:
L
X
y st = wj y j
j=1
L
X
xst = w j xj
j=1
y st
µ
byrl = µx
xst
Where rc is ”ratio combined”. Then:
L
X Nj − nj 2
V (b
µyrc ) = wj2 S
j=1
Nj nj j

Where: nj
(yjk − rc xjk )2
P
k=1
Sj2 =
nj − 1
Where rc = xyst
st
. We compute the weighted estimate of rc first.
Then apply ratio estimation with estimated rc .
Usually:
V (b
µyrs ) < V (b
µyrc )
But:
|Bias(b
µyrs )| > |Bias(b
µyrc )|
Since each µbyj in µ
brs is biased and bias is accumulated when com-
puting µbyrs . If min{n1 , n2 , . . . , nL } ≥ 20, then (b
µyrs ) is preferred
to (b
µyrc )

11. Regression Estimation:


Suppose we have:

(a) PN : u1 , u2 , . . . , un
(b) uj = (xj , yj )
(c) With means:
N
P N
P
yj xj
j j
µy = , µx =
N N
33
Suppose y = a + bx, then:
(a) µy = a + bµx
(b) y = a + bx; (x1 , y1 ), . . . , (xn , yn )
Then we may calculate:

byL = y + bb(µx − x)
µ

Where L denotes linear regression. And where the least squares esti-
mate of the slope is determined by:
n
P
(xj − x)(yj − y)
j=1
bb =
n
P
(xj − x)2
j=1

The variance of µ
byL may be calculated as:
1 N −n
( ) MSE
n N
µyL ) = E(y) + bE(µx − x) = µy (unbiased)
For fixed b, E(b
Sxy
But for b = S x2
, µyL ) = µy + O( n1 )
E(b

12. Difference Estimation:


Difference estimation is equivalent to regression estimation when b = 1
If we let:

byD = y + (µx − x)
µ
= µx + y − x
= µx + d

Where:

E(b µyD ) = µy ( always unbiased )


N − n Sd2
Vb (b
µ yD ) =
N −1 n
Where
n
(dj − d)2
P
1
Sd2 =
n−1

34
Note the following properties:
y = a + bx → regression estimation
a = 0 ⇒ y = bx → ratio estimation
b=1⇒y =a+b → difference estimation

13. Relative Efficiency:


Let E1 , E2 Be two different estimators for common parameter such as
µ. Suppose they are both unbiased, or nearly unbiased. Then, the one
with the smaller variance is the better estimator.
We define relative efficiency of E1 to E2 as:
E1 V (E2 )
RE( )=
E2 V (E1 )
E1
RE( ) > 1 ⇐⇒ V (E2 ) > V (E1 ) ⇐⇒ E1 is more efficient
E2
When exact V (E1 ), V (E2 ) are not available we can estimate as:

d E1 ) = V (E2 )
b
RE(
E2 Vb (E1 )
For estimating µy of PN we now have:
(a) The sample mean: y (unbiased)
(b) by = rµx = xy µx , bias = O( n1 )
Ratio estimate: µ
(c) Regression estimate: µbyL = y + b(µx − x), bias = O( n1 )
(d) byD = y + (µx − x), unbiased
Difference estimate: µ
Suppose n is large and all four estimators are unbiased or nearly so.
Which one is the most efficient?
14. The Rules of Efficiency:
Some things to keep in mind are:
(a) The ratio estimator will be more efficient than y when:
i. Variation among xj ’s is small
ii. Variation among yj ’s is larger
iii. the correlation between x and y is high.
(b) µ
byL is always more efficient than y (but it is biased)
(c) The regression estimator is straightly more efficient (RE > 1)
unless b = r in which case RE = 1

35
7 Chapter 7: Systematic Sampling
1. Systematic Sampling: An alternative to SRS and stratified sam-
pling, systematic sampling selects ”equally spaced” elements from a
finite population PN with a natural order of elements.
Examples:

(a) Beginning with the second student, interview every 10th student
entering a building:

2nd − 12nd − 22nd − 32nd ...

(b) Sampling every 30th item from an assembly line for quality control

2. Advantages vs SRS:

(a) Easier to perform in the field


(b) Less likely to be subject to selection errors by fieldworkers
(c) Does not require a good sampling frame
(d) Can provide more info than SRS when population has a certain
pattern

3. How to Draw a Systematic Sample: First we need to know:

(a) N , the population size (may be estimated)


(b) n, the sample size
(c) A natural ordering of elements

(a) Step 1: Divide PN into K groups of equal size of n each

nK = N

(b) Step 2: Randomly select an element from the first group, we say
the j th of the first group is selected.
(c) Step 3: Select the j th element of the group K = 2, 3, . . . , n

uj , uj+K , uj+2K , . . . , uj+(n−1)K

4. 1-in-K Sampling: A sample obtained by selecting one element from


the first K and every k th element thereafter.

36
5. There are Three Kinds of Populations:

(a) Random Population: Population elements are in random order.


For the elements PN : u1 , u2 , . . . , uN
Relabel the uj ’s 1, 2, . . . , N and a random manner.
The ordering carries no meaning.
(b) Ordered Population:

PN :u1 ≥ u2 ≥ . . . ≥ uN
or u1 ≤ u2 ≤ . . . ≤ uN

(c) Periodic Population: Populations where uj is strongly corre-


lated to uj+k for some fixed k.
Ex.
Days of the week for department store sales: k = 7

6. Estimating µ, τ and p:
Assume PN : u1 , u2 , . . . uN is randomly ordered
Let y1 , y2 , . . . , yn be a 1-in-K systematic sample from PN
Then we may view this sample as a SRS from PN .
Let the subscript sy denote systematic sample, then we have:

µ:
n
1X
b = y sy
µ = yi
n i
s2 N − n
Vb (y sy ) = ( )
n N
τ:
τb = N y sy
s2 N − n
Vb (bτ ) = V (N y sy ) = N 2 ( )( )
n N
p:
n
1X
pbsy = y sy = yi
n i
pbsy qbsy N − n
Vb (b
psy ) = ( )
n−1 N

37
7. Sample Size Determination:
To Estimate µ with the bound on error of B, we set:
q
B = 2 · V (y sy )
N σ2
n=
(N − 1)D + σ 2
B2
D=
4
Similarly, for estimating τ with a given bound B:

N σ2
n=
(N − 1)D2 + σ 2
B2
D=
4N 2
For estimating p with a given B:
N pq
n=
(N − 1)D + pq

8. Estimating for a General Population (not necessarily random):

When we say general population, it can be random, ordered or


periodic.

If we reorganize a finite population PN : u1 , u2 , . . . uN (N = nk) as


follows:
u1 u2 u3 . . . uk y11 y21 y31 . . . y1k
uk+1 uk+2 uk+3 . . . u2k y12 y22 y32 . . . y2k
u2k+1 u2k+2 u2k+3 . . . u3k → y13 y23 y33 ... ...
... ... ... ... ... ... ... ... ... ...
u(n−1)k+1 u(n−1)k+2 u(n−1)k+3 . . . unk y1n y2n y3n . . . ynk

For columns 1, 2, 3, . . . , K.
Notice that rows/columns indices are opposite to normal notation.
The sample column means y 1 , y 2 , . . . , y k , are determined by:
n
1X
yj = yi
n i

38
Since the starting element is randomly chosen from row 1, sample 1, 2, . . . , K
are equally likely to be chosen. Hence:
N
1 1 1 1 X
E(y sy ) = y 1 + y 2 + . . . + y K = uj = µ
K K K N j

Therefore, systematic sample mean y sy is unbiased.

K
1 X σ2
V (y sy ) = (y j − µ)2 = [1 + (n − 1)ρ]
K j=1 n

Where ρ denotes the correlation coefficient


In general for large N :

1. Random Population:: ρ ≈ 0 =⇒ V (y sy ) ≈ V (y)

2. Ordered Population:: ρ < 0 =⇒ V (y sy ) < V (y)

3. Periodic Population:: ρ > 0 =⇒ V (y sy ) > V (y)

4. Additional Terminology:
Let:
K
P
yj
1
y= =µ
K
Using ANOVA notation, we have:

(a) M SB:
K
n X
(y − y)2
K − 1 j=1 j

(b) M SW :
K n
1 XX
(yij − y)2
K(n − 1) i=1 j=1

(c) SST :
K X
X n
(yij − y)2
i=1 j=1

39
Then we let:
(K − 1)n · M SB − SST
ρ=
(n − 1)SST
However, ρ cannot be estimated with one systematic sample. Therefore,
we use:
s2 N − n
Vb (y sy ) = ( )
n N
Which will not underestimate V (y sy ) unless PN is periodic.

5. Repeated Systematic Sampling:


V (y sy ) cannot be estimated with a single systematic sample unless it
is also a SRS. We can use repeated systematic sampling to obtain an
estimate of V (y sy ).
Instead of one 1-in-K sample of size n, we take ns systematic samples
of size n0 = nns where:

(a) Each of the ns samples is a 1-in-K systematic sample where K 0 =


K · ns where (K 0 > K), meaning more clusters.
(b) Each 1-in-K a Sample has a random starting point.

For ns systematic sample means y 1 , y 2 , . . . , y s


We estimate the following parameters:
ns
1 X
µ
b= y
ns j=1 j
N −n 1 2
Vb (b
µ) = s
N ns y
ns
2 1 X
sy = b)2
(y − µ
ns − 1 j=1 j
τb = N µ
b
2
τ ) = N Vb (b
Vb (b µ)

40
8 Chapter 8: Cluster Sampling
1. Notation: For cluster sampling, we let PM denote the total population
and PN denote the population of the cluster.

2. Cluster Sampling: A cluster sample from PM is a SRS from PN


where elements are whole clusters of PM .
Cluster sampling is similar to stratified sampling except that:

(a) When sampling clusters, if cluster is selected, we take the entire


cluster.
(b) Not all clusters are sampled

We can take a cluster sample by dividing PM into N clusters resulting


in a PN of clusters c1 , c2 , . . . , cN of size m1 , m2 , . . . , mN respectively.
Note the following properties:

(a) ci ∩ cj = ∅
S
(b) cj = PM
j
N
P
(c) M = mj
j=1

Systematic sampling is a special case of cluster sampling when PM is


divided into K clusters and only one clusters chosen.

3. Main advantages:
The main advantages of cluster sampling are:

(a) Cost less than SRS


(b) Doesn’t require a detailed sampling frame for PM

41
4. Estimating µ, τ of PM :
Let yj be the total of all elements of PM in the j th cluster cj
Then, PN = {u1 , u2 , . . . , uN } where uj = (mj , yj ) = (size, total)
N
X
τ= yj
j=1
N
X
M= mj
j=1
N
P
yj
τ j=1 Total of PM
µ= = N = (Population Ratio)
M P Total of PN
mj
j=1
M
M= (Average Cluster Size)
N
If we consider a SRS from PN : (m1 , y1 ), (m2 , y2 ), . . . , (mn , yn ), we can
estimate µ with y c :
n
P
yj
1
yc = P
n
mj
1
1N −n 1 2
Vb (y c ) = S
n N m2 c
n
2 1 X
Sc = (yj − y c mj )2
n − 1 j=1

If we view mj as an auxiliary variable, then the calculation for y c as a


ratio estimator for R.
n
P
mj
1
When M is unknown, estimated with m = n

5. Ratio Estimation in Cluster Estimation:


xj ⇐⇒ mj
R ⇐⇒ µ
µx ⇐⇒ m
r ⇐⇒ y c
Note: E(y c ) 6= µ (biased), except when: m1 = m2 = . . . = mn

42
6. Cluster Sampling vs SRS:
Assume: m1 = m2 = . . . = mN = m (common cluster size)
Then,
PN
M= mj = N m (Population Size)
j=1
N
P
yj
j=1
µ= Nm
(Population Mean)
n cluster sampled ⇒ sample size = n · m
We estimate µ and τ of PM with n clusters:
N
P
yj N
j=1 1 X
µ
b = yc = = yj
N
P n · m j=1
mj
j=1

E(b µ) = µ ( unbiased)
τb = M y c = N · y
E(b τ ) = τ ( unbiased)
N −n 2
Vb (y c ) = S
N nm2 c
n
(yj − y c m)2
P

Sc2 = 1
n−1
N −n 2
V (bτ ) = N 2 V (y) = N 2 S
nN y
n
(yi − y)2
P
1
Sy2 =
n−1
But with the SRS of size n · m the estimator of µ = y is unbiased and:

N − n S2
V (y) = ·
N nm
n n m
1X 1 XX
yc = yi = yij
n 1 nm i=1 j=1

43
7. ANOVA Notation:

SST = SSW + SSB


Total SS = (Within Cluster SS) + (Between Cluster SS)
Xn Xm
SST = (yij − y c )2
i=1 j=1
Xn X m
SSW = (yij − y j )2
i=1 j=1
Xn X m n
X
2
SSB = (y i − y c ) = m (y L − y c )2
i=1 j=1 L=1
n
SSB m X 1 2
M SB = = (y i − y c )2 = S
n−1 n−1 i=1
m c
N −n 1
Vb (y c ) = M SB
N nm
N −n 1 2
Vb (y) = S
N nm

∴ if M SB < S 2 then Vb (y c ) < Vb (y) ⇒ cluster sampling is more efficient


than SRS.
This occurs when M SB < M SW (When the variance between clusters
is less than the variance within clusters).
This is the opposite of stratified random sampling which is more effi-
cient than SRS when M SW is small and M SB is large.

8. Sample Size Determination for Estimating µ and τ :


Say we have finite population PM with clusters c1 , c2 , . . . , cN with re-
spective sizes m1 , m2 , . . . mN
N
If N is known but the mj ’s are not known then M = N1
P
mj is un-
1
known as well.
We estimate:
1N −n 2
V (y c ) ≈ σ
n Nm c

44
However, since σc2 , the between cluster variance, is unknown, we have
to estimate it along with m with the prior survey, pilot study etc.
Then:
p N σc2
2 V (y c ) = B ⇒ n =
N D + σc2
Where:
(B 2 m2 )
D=
4
when estimating µ

B2
D=
4N 2
when estimating τ

9. Estimating a Population Proportion:


Say we have cluster ci = {ai1 , ai2 , . . . , aimi } where:
(
0
aij =
1
for j = 1, 2, . . . , mi That is, for some cluster ci the elements of that
cluster are either 0 or 1.
Then we define the cluster total is the total number of 1’s:
mi
X
ai = aij
j=1

Then: n
P
ai
i=1
pb = Pn
mi
i=1

Which is almost the same estimator as y c


N − n
Vb (b
p) = S2
N nm2 p
n
(ai − pbmi )2
P

Sp2 = 1
n−1
The sample size determination for a given B is similar to the µ case.

45
10. How to draw a sample of clusters (equal probability sampling):
We have two options:

(a) A SRS from PN of clusters Pn = {u1 , u2 , . . . , uN }


Where uj = (yj , mj ).
Then draw n without replacement and with:
1
p(u1 , u2 , . . . , un ) = N

n

(b) For large N we may draw with replacement and with:


1
p(uj ) =
N
11. Unequal Probability Sampling:
Say we have PN : u1 , u2 , uN with sampling probabilities p1 , p2 , . . . pN
and where:
N
P
uj = (mj , yj ), M = mj
1

Sampling with Probability Proportion to Size (PPS sampling):


Draw a sample from PN with replacement where the sampling proba-
bility is:
mj
pj =
M
If mj and/or M are not available but some auxiliary variable xj α mj
is, we may use:
xj
pj = N
P
xj
1

46
Under PPS we have:
n
1X yj
τbpps =
n j=1 (mj /M )
n
1X
µ
bpps = y
n j=1 j
(both τbpps and µbpps are unbiased)
n
1 X
Vb (b
µpps ) = bpps )2
(y j − µ
n(n − 1) i=1
n
m2 X
Vb (b
τpps ) = bpps )2
(y − µ
n(n − 1) i=1 j

12. Sampling with Unequal Probability of Selection:


Say we have PN : u1 , u2 , uN with sampling probabilities p1 , p2 , . . . pN
and where:
XN
pj = 1
j=1

Then:
(a) Divide up the units along the interval (0, 1) so that each pj is
proportionate to its length along the interval
(b) Generate a random number from unif [0, 1]
(c) If the random number falls into the j th interval, then uj is selected.
(d) Repeat steps (a-c) n times to draw a sample of size n with selection
probability p1 , p2 , . . . pN
13. Three Estimators for Population Total τ :
(a) SRS Estimator: for y1 , y2 , . . . , yn
n
1X
τb = N ( yj ) = N y
n j=1
Unbiased
N −n 2
τ) = N2
Vb (b S
Nn
N
(yj − y)2
P

S2 = 1
n−1

47
(b) Ratio Estimator: (SRS of clusters)
(m1 , y1 ), (m2 , y2 ), . . . , (mn , yn )
n
P
 yj 
1
τbR = M · n
P
mj
1
Biased

(c) Unequal Probability Sampling Estimator:


n
1 X yj
τbp = M
n 1 mj
Unbiased

Comparing the three estimators:

(a) SRS vs Ratio: If mj and yj are uncorrelated, then SRS is better


(smaller variance). If they are highly correlated then the ratio
estimator is better.
This comparison can be made after the SRS is taken.
(b) Ratio vs Unequal Probability Sampling: For cases where mj
and yj are highly correlated, we can choose between the two based
on:
i. Choose unequal probability if within cluster variation does
not change with mj
ii. Choose ratio estimation if within clustering variation changes
with mj

48
9 Chapter 9: Two-Stage Cluster Sampling
1. Two-stage Cluster Sampling:
Say we have a finite population PM broken into clusters c1 , c2 , . . . , cN
In the previous chapter, we take a SRS from PN by selecting some of
the clusters c1 , c2 , . . . cn .
But when the clusters are too large, we may only take a sample from
each cluster. Two-stage cluster sampling happens as:

(a) Stage I: Take a sample of clusters (select only some of the clusters;
SRS from PN )
(b) Stage II: take a SRS from each sampled cluster c1 , c2 , . . . cn

Estimators for µ and τ of PM can be constructed using two-stage cluster


samples
If M is known, we can use an unbiased estimator.
If M is unknown, we can use the ratio estimator (biased).

49
10 Chapter 10: Estimating the Population
Size
Here, the parameter of interest of PN is the population size N

1. There are four main methods of estimation, the first two deal mainly
with moving populations such as animals:

(a) Direct sampling


(b) Inverse sampling
(c) Density estimation
(d) Presence or absence estimation

2. Direct Sampling: we perform direct sampling for an animal popula-


tion as follows:

(a) First draw a SRS of size t from PN and tag all individuals in the
sample space before releasing them.
(b) After a period of time to ensure a thorough mixing of tagged
individuals with untagged ones, take a second SRS sample of size
n.

Suppose there are s tagged individuals in the second sample where


s > 0, then:
b = n·t
N
s
Which is:
b = (sample 1 size) · (sample 2 size)
N
(tagged individuals in sample 2)
2
b ) = t n(n − s)
Vb (N
s3
b ) = N + N (N − t) (biased)
E(N
nt

50
We can calculate 100(1 − α) confidence intervals for N
b as:
q
N ± 2 Vb (N
b b)

Or we can calculate the CI for N based on the CI for p as:


r
pb(1 − pb)
pb ± z α2 = pb ± ∆
n
Which gives the CI interval:
 t t 
,
pb + ∆ pb − ∆
Where:
s
pb =
n
3. Sampling Without Replacement:
Previously the second example is a random sample taken with replace-
ment from PN . If we simple without replacement for the second sample
we get a hyper geometric distribution of the form:
t N −t
 
s n−s
P (S = s) = N
 , s = 0, 1, . . . , n
n

∴ s ∼ hypergeometric(n; N − t, t) and the estimator N


b remains un-
changed.
We can construct a confidence interval for N in this case as follows:
s
(a) First, constructor confidence interval for p where pb = n
as:
r
pbqb N − n
pb ± z α2
n−1 N

(b) Next, construct a confidence interval for N . Assuming N  n,


N −n
N
≈ 1, then proceed as before.

51
4. Inverse Sampling of PN :

(a) Draw a SRS of size t, tag and release.

PN = t + (N − t)

N is unknown
(b) For pre-chosen constant s > 0, draw sequentially at random with
replacement from PN until s tagged individuals are observed.
t
Let p = N
= the probability a randomly selected individual is a tagged
one.
Trial: selecting one individual at random from PN
Sampling with replacement implies that p is constant for all trials.
Let n be the number of trials needed to obtain s successes (tagged
individuals), then this implies that:
t
n ∼ NegBin(s; p = )
N
 
n−1 s
P (n = n) = p (1 − p)n−s
s−1

Solving for the respective MLE’s:


s
pb =
n
b = ·t
N
n
s

Notice that N b is the same estimator as before but with a different


distribution. (s) is fixed.
By the negative binomial property:
s
E(n) =
p
s(1 − p)
V (n) =
p2
∴ E(Nb ) = N Unbiased
nt2 (n − s)
V (N ) = 2
b b
s (s + 1)

52
5. Density Estimation:
For density estimation we calculate the population total:

N = (Population Density) × (Area)

We estimate N as:
N b·A
b =λ

Where λ is the estimated density.


Say for a given strip of land, we are interested in estimating the animal
abundance, and where the observations are denoted:

(x1 , y1 ), . . . (xn , yn )

Where:
xj is the area of the j th strip.
yj is the number of animals observed in the j th strip.
Then the estimated density for the j th strip is:
yj
xj

And the average density is:


n
b= 1
X yj
λ
n j=1 xj

And the population total is estimated by:


n
b= 1
X yj
N
n j=1 pj

Where:
xj
pj =
A
This is equivalent to an equal probability sampling

53
6. Presence/Absence Survey Estimation:

(a) Divided the area of interest to ”squares” of equal size a.


(b) Draw n squares at random and inspect each selected square for
the presence (1) or absence (0) of the individuals of interest (such
as a certain species of tree).

Our data is in the form of:


(
1 present in j th square
yj = j = 1, 2, . . . , n
0 if absent

Total count: n
X
yj
1

Then let: n
X
y =n− yj
1

denote the number of zeros.


Suppose N individuals randomly distributed over A, then:
N
λ=
A
denotes the density.
Then, the probability that a randomly selected square is zero is:

= e−λa

This implies that:


y b = − 1 ln( y )
≈ e−λa ⇒ λ
n a n
Nb =λ A
b × A = − ln( y )
a n
1 λa
Vb (λ)
b = (e − 1)
b
na2

54

You might also like