STAT354 Study Guide
STAT354 Study Guide
Nils DM
April 14, 2021
1 Introduction
1. Sampling: The selection of a subset (a statistical sample) of individ-
uals from within a statistical population to estimate characteristics of
the whole population.
1
6. Sample Collection Methods: There are three ways in which samples
are obtained:
2
2 Elements of the Sampling Problem
1. Element: An element on which a measurement is taken.
Ex. A voter being polled
(a) θ:
b An estimated value based on the sample
(b) B: A desired error bound |θb − θ| < B
(c) 1 − α: A desired probability that
Where B = 2σθb
The objective is to have 1 − α be as close to 1 as possible. Usually
we set 1 − α = 0.95
3
10. Probability Sampling Design: As a sampling design based on some
planned randomness. The main probability sampling designs are:
12. Source of Error in Surveys: There are two types of survey errors:
4
ii. The Respondent: Motivation to answer correctly/inability
to answer correctly.
iii. The Measurement Instruments: Issues or limitations such
as inaccurate or mis-calibrated instruments.
(a) Callbacks
(b) Rewards and incentives
(c) Trained interviewers/quality instruments
(d) Data checks
(e) Questionnaire construction
5
(i) Organization of fieldwork
(j) Organization of data management (at each stage)
(k) Data analysis, such as
i. boxplots
ii. histograms
iii. QQ plots
6
3 Basic Concepts of Statistics
1. Histograms: Visualize the shape of the data
(a) Q1 : the first quantile, 25% of data below this point (the bottom
edge of the box).
(b) Q2 : the median. 50% below this point (the middle line inside the
box).
(c) Q3 : the third quantile, 75% of data below this point (the top edge
of the box).
(d) IQR = Q3 − Q1: the lines coming out of the box each represent
1.5 · IQR
X ∞
X
µ = E(Y ) = yp(y) = yj p(yj )
y j=1
(f) Variance:
X ∞
X
2 2 2
σ = V (Y ) = E(y − µ) = (y − µ) p(y) = (yj − µ)2 p(yj )
y j=1
√
(g) Standard Deviation: σ = σ2
7
(h) Estimating parameters through random sampling: When
P (yj ) is not available, the exact values of parameters are not avail-
able but can be estimated through a random sample from P∞
n
1X
µ
b=y= yi
n 1
n
1 X
σb2 = s2 = (yj − y)2
n − 1 j=1
√
s = s2
(a) Total:
N
X
τ = u1 + u2 + . . . + uN = ui
1
(b) Mean:
u1 + u2 + . . . + un τ
µ= =
N N
(c) Variance:
w
2 1 X
σ = (uj − µ)2
W j=1
√
(d) Standard Deviation: σ = σ2
δ1 , δ2 , . . . , δN
where:
(a) δj ≥ 0
P
(b) δj = 1
8
Then for N = 4 and n = 2, there are:
4 4
+ = 10
2 1
distinct samples where 42 denotes the combinations where u1 6= u2 ,
then we obtain the following probabilities and means for the following
samples:
Sample Prob Mean
ui , ui δi2 ui
ui +uj
ui , uj 2δi δj 2
Where the 2 is the result of the two possible different orderings for the
same sample when i 6= j.
6. Estimating τ :
Let:
PN : u1 , u2 , . . . , uN
with selection probability for each ui :
δ1 , δ2 , . . . , δN
And let a sample from PN be denoted as:
{y1 , y2 , . . . , yn }
with corresponding probabilities:
δ1 , δ2 , . . . , δi
We can estimate τ as: n
1 X yi
τb =
n i=1 δi
The preferred derivation is:
1
δi = ,
N
n
1 X yi
τb =
n i=1 N1
n
1X
=N· yi
n i=1
τ
=N ·y =N ·
N
= E(b
τ) = τ ∴ τb is unbiased
9
Ex.
If we have P3 = {1, 2, 3}
with τ = 1 + 2 + 3 = 6
and δ1 = 0.2, δ2 = 0.3, δ3 = 0.5
for n = 2,
we can estimate τ with
Sample Prob τb
1 yi
ui , ui δi2 (
n δi
+ yδii )
1 yi y
ui , uj 2δi δj (
n δi
+ δjj )
r
X
E(b
τ) = pi τbi
i=1
r
X
V (b
τ) = (pi τbi2 ) − E(b
τ )2
i=1
Where: wi = π1i
Ex.
If we have P3 = {1, 2, 3}
with τ = 1 + 2 + 3 = 6
and δ1 = δ2 = δ3 = 13
for n = 2,
πi = 32 (the number of n = 2 samples each ui is in)
Where the probability for each distinct sample (or order does not
matter) is p = 31
10
We then have the following table:
Sample Prob τb
1 u1 u2 1 2
(1, 2) 3 π1
+ π2
= 2 + 2
3 3
1 1 3
(1, 3) 3 2 + 2
3 3
1 2 3
(2, 3) 3 2 + 2
3 3
n
X
E(b
τ) = τbi pi
i=1
n
X
V (b
τ) = τ )2
(τi2 pi ) − E(b
i=1
τb, E(b
τ ), V (b
τ ) are all calculated the same way as above. The variance
in this instance is lower than when di = 13 for all i.
The variance of an estimator is lowered if δj is proportional to the
size of yj (δj increases as the value for each element in the sample yj
increases).
11
Sampling Distributions
Let θb denote the estimator of a population parameter.
1. Properties of θ:
b
(a) E(θ)
b = θ (this means θb is unbiased)
(b) V (θ)
b is small
We want both the bias and MSE to be small and also for:
[
θb ∼ N (θ, V (θ))
b
(θb − Z α2 s.e.(θ),
b θb + Z α s.e.(θ))
2
b
Where: q
[
s.e.(θ)
b = V (θ)
b
A 95% CI ⇔ (1 − α) = 0.95 ⇔ α = 0.05 ⇔= Z α2 = Z0.025 = 1.96
Remember:
|θb − θ| ≤ B = 2s.e.(θ)
b
Where B is the bound on the error.
2. All three desirable properties for θb depend on its distribution. Since
θb as a function of the sample, we call its distribution the sampling
description of θb which varies with sample size n
3. For a random sample with equal probability selection and with re-
placement an estimator for µ is:
n
1X
x= yi
n i=1
Where:
(a) E(x) = µ
σ2
(b) V (x) = n
12
4. For a random sample with equal probability selection and without
replacement an estimator for µ is:
n
1X
x= yi
n i=1
Where:
(a) E(x) = µ
σ2 N −n
(b) V (x) = n
· N −1
, σ 2 unknown
Where:
E(y1 y2 ) = µ1 µ2
(a) ρ = 0 ⇔ no correlation
(b) |ρ| = 1 a perfect linear relationship between y1 = y2 :
y2 = a + by1
13
4 Chapter 4: Simple Random Sampling
1. Simple Random Sampling (SRS): For a finite population:
PN = u1 , u2 , . . . , un
Where:
N
n
Denotes the total number of all distinct samples of size n.
A sampling procedure which draws a sample of size n from PN in such
a way that all Nn samples of size n have equal chance of being drawn
PN = u1 , u2 , . . . , un
τb = N µ
b = Ny
14
Variance of Population Mean is estimated by:
s2 N − n
Vb (y) = ( )
n N
Variance of Population Total is estimated by:
2
s N −n
τ ) = N 2 Vb (b
Vb (b µ) = N 2 ( )( )
n N
Bound on Error of Estimation is estimated by:
r
s2 N − n
q
B = 2 Vb (y) = 2 · ( )
n N
4. Finite Population Correction Factor: If N >> n, then:
N −n
≈1
N
The finite population correction factor is used when you sample without
replacement from more than 5% of a finite population. It’s needed
because under the circumstances, the CLT doesn’t hold of the standard
error of the estimate will be too big.
5. Sample Size Determination: Suppose we want to estimate µ within
a desired error bound B, where:
r
s2 N − n
q
B = 2 Vb (y) = 2 · ( )
n N
How do we achieve this bound?
If we solve for n, we obtain:
N σ2
n= 2
(N − 1) B4 + σ 2
But since σ 2 is unknown, in practice we use:
N s2
n= 2
(N − 1) B4 + s2
PN : u1 , u2 , . . . , un
Where: (
1 jth member has a specific trait
uj =
0 otherwise
Such a population PN is fully characterized by a single parameter:
Where: n
X uj
p=
j=1
N
And where:
q =1−p
The proportion of 0’s
E(uj ) = p, 0 ≤ p ≤ 1
V (uj ) = p(1 − p)
16
11. Sample Proportion Estimation from SRS: Consider a finite pop-
ulation consisting of 0’s and 1’s:
PN : y1 , y2 , . . . , yn
Where: (
1 jth member has a specific trait
yj =
0 otherwise
The sample proportion is calculated by:
n
1X
pb = yi = y
n i=1
Where 0 ≤ pb ≤ 1.
pbqb N − n
Vb (b
p) = ( )
n−1 N
Where q = 1 − p
17
15. Two Independent Samples: Consider:
E(y − x) = µy − µx (unbiased)
V (y − x) = V (y) + V (x) (Independent)
s2 Ny − m
Vb (x) = x ( )
m Ny
s2y Ny − n
Vb (y) = ( )
n Ny
q
B = 2 · Vb (y) + Vb (x)
E(pb1 − pb2 ) = p1 − p2
V (pb1 − pb2 ) = V (pb1 ) + V (pb2 ) − 2cov(pb1 , pb2 )
p1 (1 − p1 ) p2 (1 − p2 ) p1 p2
= + +2
n n n
Where:
pi pj
cov(pbi , pbj ) = −
n
18
5 Chapter 5: Stratified Random Sampling
1. Stratified Random Sampling: A type of sampling where we:
(a) When the values of the variable of interest are more homogeneous
within stratum than across the population. Better position of
estimation can be gained.
(b) When there is an administrative and/or cost advantage
(c) Stratified sampling allows us to estimate parameters for individual
strata, therefore, more information may be gained.
3. Examples of strata:
19
With,
L
X
SRS size: n1 , n2 , . . . , nL = nj = n
j=1
20
5. Estimating µ and τ : With stratified random sampling having strata
sample sizes n1 , . . . , nL ,
let y1 , y2 , . . . , yL be the corresponding strata sample means,
Then for the ith stratum:
E(yi ) = µi
σ 2 Ni − ni
V (yi ) = i ( )
ni Ni − 1
s2 Ni − ni
Vb (yi ) = i ( )
ni Ni
Where σi2 is the ith stratum variance, and s2i is the ith sample variance.
Strata samples are independent ⇒ yi ⊥yj ⇒ cov(yi , yj ) = 0
τbi = τbst = Nj yj
L
X
Vb (b
τst ) = Nj2 V (yj )
i=1
L
X Nj − nj s2j
= Nj2 ( )
j=1
Nj nj
1 1
µ
bst = y st = τbst = [N1 y 1 + . . . + NL y L ]
N N
1
Vb (b
µst ) = Vb (b
yst ) = 2 Vb (b
τst )
N
L
1 X 2 Ni − ni s2i
= 2 N ( )( )
N i=1 i Ni ni
21
8. Sample Size Determination:We will use the following notation:
(a) n1 , n2 , . . . nL : sample sizes for individual strata
PL
(b) n = nj : the total sample size
j=1
n
(c) aj = nj : the allocation fraction for the j th strata
We consider two sample size determination problems:
(a) For a given bound on error B and allocation fractions a1 , a2 , . . . , aL ,
we want to find the sample size n to achieve the specified B
(b) For a given sample size n given, we want to determine the ”opti-
mal” allocation fractions a1 , a2 , . . . , aL
Since,
L L
B2 B2 1 X Nj2 s2j 1 X
q
B = 2 V (y st ) ⇒
b = V (y st ) ⇒
b = 2 − 2 Nj s2j
4 4 N j=1 naj N j=1
This implies:
L
P Nj2 s2j
aj
j=1
n= L
Nj s2j )
P
(N 2 D +
j=1
B2
Where: D = 4
Since we normally don’t know s2j ahead of time, we can estimate it in
the following ways:
(a) From a pilot sample (costly and not always practical)
(b) By calculating the range of the j th strata
(c) By analysing existing data/previous studies
q
9. The n needed to achieve B = 2 Vb (b τst )can be derived as follows:
To achieve a desired bound on error B under pre-determined allocation
fractions a1 , a2 , . . . , aL caps the sample size n needed is given by the
n = . . . above where:
B2
D= , for estimating µ
4
B2
D= , for estimating τ
4N 2
22
10. Determining the Optimal Allocation Fractions aj , . . . , aL :
Suppose n is given and (Nj , σj2 ) are known for j = 1, . . . , L. The total
cost of sampling n units is:
L
X
c= cj n j
j=1
Ni
≈1
Ni − 1
Then we have the optimal allocation fraction:
N j σj
√
nj cj
aj = = N NL σL
n √1 σ1 + N
√2 σ2 + ... + √
c1 c2 cL
23
Special Cases: Optimal allocation coincides with two well-known spe-
cial allocations:
nj Nj
aj = = = wj
n N1 + . . . + NL
Under Neyman allocation, the equation for the total sample size n
becomes:
L
Ni σi2
P
i=1
n= L
N D + N1 Ni σi2
P
i=1
Note that optimal allocation is not included here since it is clearly the
best by definition
24
The three are usually quite close, however we see a big difference when
σ12 , σ22 , . . . , σL2
V (neyman) ≤ V (prop) is always true.
With effectively defined strata, stratified sampling provides more
accurate estimates than SRS
When is this untrue?
If nj is a considerable fraction of Nj and hence FPC matters, then the
above statement may be false.
12. Setting up Strata: The efficiency of stratified estimators depends on
the choice of strata. How do we divide a PN into strata effective for
stratified sampling?
(a) Case 1: Strata present themselves.
Estimating average household income for Victoria, the obvious
stratum would be neighbourhoods.
(b) Case 2: Auxiliary (numerical) variable available.
We can use the cumulative square root method for finding ≤ 6
strata.
The idea:
i. Split data and a starting stratum
ii. Sleep the frequency, square root of the frequency and cumu-
lative square root frequency for each starting stratum.
iii. If we want L = x stratum, divide the cumulative square root
frequency by x, and set each partition as tx for t = 1, ..., L − 1
iv. The partitions calculated above are the new strata
Strata defined by this method is approximately optimal for each
fixed L.
For example see p. 12 of Week 5-6
13. Estimation of Population Proportion: Recall that the population
proportion p is just the mean of:
PN : u1 , u2 , . . . , un
Where: (
1, the j th individual has the trait
uj =
0, otherwise
So we can use methods developed for estimating a mean µ for p. Here:
25
(a) µ = p overall population mean/proportion
(b) µj = pj j th stratum mean/proportion
(c) σi2 = pi qi j th stratum variance
Nj
(d) for N, Ni population size and j th stratum size wj = N
(d) V (b
ps t):
L
X pi qi Ni − ni
= wi2 ( )
i=1
ni Ni − 1
(e) Vb (b
pst ):
L L
X s2 Ni − ni X pbi qbi Ni − ni
= wi2 i ( ) = wi2 ( )
i=1
ni Ni − 1 i=1
ni − 1 Ni − 1
ni pbi qbi
Since s2i = ni −1
15. Optimal Allocation Fraction: For fixed cost, the optimal allocation
is: q
pbj qbj
h Nj cj
i
nj = n q q
N1 p1c1qb1 + . . . + NL pbLcLqbL
d
26
(a) We are unable to collect stratified samples directly:
Ex. We want a sample of 50% male voters and 50% female voters:
i. Take a SRS telephone survey of randomly chosen households
(not stratified samples)
ii. After the SRS is taken, divide it into male/female voter sam-
ples and view them as SRS from the two strata respectively
(b) We need to correct and unbalanced SRS:
To do this, we calculate a weighted average such as:
N1 N2
y st = y1 + y2
N N
Ni
wi =
N
Say, n0 = n01 + . . . + n0L , then:
n0j
w
cj =
n0
L hc
X wj2 s2j cj (yj − y 0 st )2 i
w
Vb (y 0st ) = +
j=1
nj n0
27
6 Chapter 6: Ratio, Regression and Differ-
ence Estimation
The last two chapters were concerned with the design of sample survey
(SRS vs Stratified) where we exclusively used sample means to estimate µ.
We now consider methods of estimation that make use of
auxiliary/subsidiary variables. These can be combined with both SRS and
stratified sampling
1. The need for an auxiliary variable, such as a ratio estimator arises
when either N is unavailable or too time-consuming to determine.
In the case where we have oranges with sugar content y1 , y2 , . . . yn
The average a sugar content is determined by:
n
1X
y= yi = µ
by
n i=1
And therefore:
by = N y
τby = N µ
Is the total sugar content for the data.
Since N is not known, we may exploit the fact that xi , . . . xn repre-
senting the weights of an orange may be much easier to calculate the
number of oranges.
We then observe that:
µy Nµ by τy
= =
µx Nµ bx τx
Where τx the total weight of all oranges.
Therefore:
µy Nµ
by τy
τy = · τx = · τx = · τx
µx Nµ
bx τx
2. Calculating τy :
(a) Take an SRS and record both weight and sugar content.
(b) Use your SRS to then calculate µ
by = y and µ
bx = x.
(c) Assuming τx is known, calculate τby as:
y
· τx
x
This is an example of a ratio estimator based on the auxiliary
variable weight.
28
3. Ratio Estimator Notation:
We define:
µy τy
R= =
µx τx
And denote:
y
r=
x
to be the estimator of R. Then we define:
τr ) = V (r)τx2
(a) V (b
µr ) = V (r)µ2x
(b) V (b
Then:
N N
X yj X xi
µy = , µx =
i
N i
N
N
X N
X
τy = yj = N µ y τx = xj = N µ x
i i
Then we define:
µy τy
R= =
µx τx
As the population ratio.
Start by taking an SRS from PN {u1 , u2 , . . . , un }
29
Or:
y1 , y2 , . . . , yn ⇒ y
x1 , x2 , . . . , xn ⇒ x
Where:
y N −n 1 2
V (r) = V ( ) = ( )( 2 )Sr
x nN µx
Where: n
(yj − rxj )2
P
j
Sr2 =
n−1
5. Ratio Estimator of τy :
y
τby = rτx = τx
x
N −n 2
τy ) = N 2 (
Vb (b )Sr
nN
6. Ratio Estimator of µy :
y
µ
by = rµx = µx
x
N −n 2
Vb (b
µy ) = ( )Sr
nN
The ratio estimator for τy is better than the SRS estimator since it has
a smaller variance.
SRS is better when the parameters are nearly uncorrelated. Ratio
estimator is better when we have a strong positive correlation between
parameters.
30
8. Sample Size Determination: We can estimate the following param-
eters with bound on error = B as follows:
(a) Estimating R :
p
B = 2 V (r)
s
N −n 1 2
B=2 ( )( 2 )σ
nN µx
Where:
N
1 X
2
σ = (yj − Rxj )2
N j=1
(b) Estimating µy :
p
B = 2µx V (r)
N σ2
n = B2
N 4 + σ2
for µy
(c) Estimating τy :
p
B = 2τx V (r)
N σ2
n= B2
N 4N 2
2 + σ
for τy
31
9. Simulation Study of Ratio Estimators: We can simulate ratio
estimators as follows:
Let µ
byrs be the weighted µ
byi where rs is ”ratio separate”. Then:
L L
X X yj
µ
byrs = wj µbj = wj µx
j=1 j=1
xj j
L
X Nj − nj 2
Vb (b
µyrs ) = wj2 S
j=1
Nj − nj rj
Where: nj
(yjk − rj xjk )2
P
k=1
Srj =
(nj − 1)
32
(b) Combined Ratio Estimator:
Let:
L
X
y st = wj y j
j=1
L
X
xst = w j xj
j=1
y st
µ
byrl = µx
xst
Where rc is ”ratio combined”. Then:
L
X Nj − nj 2
V (b
µyrc ) = wj2 S
j=1
Nj nj j
Where: nj
(yjk − rc xjk )2
P
k=1
Sj2 =
nj − 1
Where rc = xyst
st
. We compute the weighted estimate of rc first.
Then apply ratio estimation with estimated rc .
Usually:
V (b
µyrs ) < V (b
µyrc )
But:
|Bias(b
µyrs )| > |Bias(b
µyrc )|
Since each µbyj in µ
brs is biased and bias is accumulated when com-
puting µbyrs . If min{n1 , n2 , . . . , nL } ≥ 20, then (b
µyrs ) is preferred
to (b
µyrc )
(a) PN : u1 , u2 , . . . , un
(b) uj = (xj , yj )
(c) With means:
N
P N
P
yj xj
j j
µy = , µx =
N N
33
Suppose y = a + bx, then:
(a) µy = a + bµx
(b) y = a + bx; (x1 , y1 ), . . . , (xn , yn )
Then we may calculate:
byL = y + bb(µx − x)
µ
Where L denotes linear regression. And where the least squares esti-
mate of the slope is determined by:
n
P
(xj − x)(yj − y)
j=1
bb =
n
P
(xj − x)2
j=1
The variance of µ
byL may be calculated as:
1 N −n
( ) MSE
n N
µyL ) = E(y) + bE(µx − x) = µy (unbiased)
For fixed b, E(b
Sxy
But for b = S x2
, µyL ) = µy + O( n1 )
E(b
byD = y + (µx − x)
µ
= µx + y − x
= µx + d
Where:
34
Note the following properties:
y = a + bx → regression estimation
a = 0 ⇒ y = bx → ratio estimation
b=1⇒y =a+b → difference estimation
d E1 ) = V (E2 )
b
RE(
E2 Vb (E1 )
For estimating µy of PN we now have:
(a) The sample mean: y (unbiased)
(b) by = rµx = xy µx , bias = O( n1 )
Ratio estimate: µ
(c) Regression estimate: µbyL = y + b(µx − x), bias = O( n1 )
(d) byD = y + (µx − x), unbiased
Difference estimate: µ
Suppose n is large and all four estimators are unbiased or nearly so.
Which one is the most efficient?
14. The Rules of Efficiency:
Some things to keep in mind are:
(a) The ratio estimator will be more efficient than y when:
i. Variation among xj ’s is small
ii. Variation among yj ’s is larger
iii. the correlation between x and y is high.
(b) µ
byL is always more efficient than y (but it is biased)
(c) The regression estimator is straightly more efficient (RE > 1)
unless b = r in which case RE = 1
35
7 Chapter 7: Systematic Sampling
1. Systematic Sampling: An alternative to SRS and stratified sam-
pling, systematic sampling selects ”equally spaced” elements from a
finite population PN with a natural order of elements.
Examples:
(a) Beginning with the second student, interview every 10th student
entering a building:
(b) Sampling every 30th item from an assembly line for quality control
2. Advantages vs SRS:
nK = N
(b) Step 2: Randomly select an element from the first group, we say
the j th of the first group is selected.
(c) Step 3: Select the j th element of the group K = 2, 3, . . . , n
36
5. There are Three Kinds of Populations:
PN :u1 ≥ u2 ≥ . . . ≥ uN
or u1 ≤ u2 ≤ . . . ≤ uN
6. Estimating µ, τ and p:
Assume PN : u1 , u2 , . . . uN is randomly ordered
Let y1 , y2 , . . . , yn be a 1-in-K systematic sample from PN
Then we may view this sample as a SRS from PN .
Let the subscript sy denote systematic sample, then we have:
µ:
n
1X
b = y sy
µ = yi
n i
s2 N − n
Vb (y sy ) = ( )
n N
τ:
τb = N y sy
s2 N − n
Vb (bτ ) = V (N y sy ) = N 2 ( )( )
n N
p:
n
1X
pbsy = y sy = yi
n i
pbsy qbsy N − n
Vb (b
psy ) = ( )
n−1 N
37
7. Sample Size Determination:
To Estimate µ with the bound on error of B, we set:
q
B = 2 · V (y sy )
N σ2
n=
(N − 1)D + σ 2
B2
D=
4
Similarly, for estimating τ with a given bound B:
N σ2
n=
(N − 1)D2 + σ 2
B2
D=
4N 2
For estimating p with a given B:
N pq
n=
(N − 1)D + pq
For columns 1, 2, 3, . . . , K.
Notice that rows/columns indices are opposite to normal notation.
The sample column means y 1 , y 2 , . . . , y k , are determined by:
n
1X
yj = yi
n i
38
Since the starting element is randomly chosen from row 1, sample 1, 2, . . . , K
are equally likely to be chosen. Hence:
N
1 1 1 1 X
E(y sy ) = y 1 + y 2 + . . . + y K = uj = µ
K K K N j
K
1 X σ2
V (y sy ) = (y j − µ)2 = [1 + (n − 1)ρ]
K j=1 n
4. Additional Terminology:
Let:
K
P
yj
1
y= =µ
K
Using ANOVA notation, we have:
(a) M SB:
K
n X
(y − y)2
K − 1 j=1 j
(b) M SW :
K n
1 XX
(yij − y)2
K(n − 1) i=1 j=1
(c) SST :
K X
X n
(yij − y)2
i=1 j=1
39
Then we let:
(K − 1)n · M SB − SST
ρ=
(n − 1)SST
However, ρ cannot be estimated with one systematic sample. Therefore,
we use:
s2 N − n
Vb (y sy ) = ( )
n N
Which will not underestimate V (y sy ) unless PN is periodic.
40
8 Chapter 8: Cluster Sampling
1. Notation: For cluster sampling, we let PM denote the total population
and PN denote the population of the cluster.
(a) ci ∩ cj = ∅
S
(b) cj = PM
j
N
P
(c) M = mj
j=1
3. Main advantages:
The main advantages of cluster sampling are:
41
4. Estimating µ, τ of PM :
Let yj be the total of all elements of PM in the j th cluster cj
Then, PN = {u1 , u2 , . . . , uN } where uj = (mj , yj ) = (size, total)
N
X
τ= yj
j=1
N
X
M= mj
j=1
N
P
yj
τ j=1 Total of PM
µ= = N = (Population Ratio)
M P Total of PN
mj
j=1
M
M= (Average Cluster Size)
N
If we consider a SRS from PN : (m1 , y1 ), (m2 , y2 ), . . . , (mn , yn ), we can
estimate µ with y c :
n
P
yj
1
yc = P
n
mj
1
1N −n 1 2
Vb (y c ) = S
n N m2 c
n
2 1 X
Sc = (yj − y c mj )2
n − 1 j=1
42
6. Cluster Sampling vs SRS:
Assume: m1 = m2 = . . . = mN = m (common cluster size)
Then,
PN
M= mj = N m (Population Size)
j=1
N
P
yj
j=1
µ= Nm
(Population Mean)
n cluster sampled ⇒ sample size = n · m
We estimate µ and τ of PM with n clusters:
N
P
yj N
j=1 1 X
µ
b = yc = = yj
N
P n · m j=1
mj
j=1
E(b µ) = µ ( unbiased)
τb = M y c = N · y
E(b τ ) = τ ( unbiased)
N −n 2
Vb (y c ) = S
N nm2 c
n
(yj − y c m)2
P
Sc2 = 1
n−1
N −n 2
V (bτ ) = N 2 V (y) = N 2 S
nN y
n
(yi − y)2
P
1
Sy2 =
n−1
But with the SRS of size n · m the estimator of µ = y is unbiased and:
N − n S2
V (y) = ·
N nm
n n m
1X 1 XX
yc = yi = yij
n 1 nm i=1 j=1
43
7. ANOVA Notation:
44
However, since σc2 , the between cluster variance, is unknown, we have
to estimate it along with m with the prior survey, pilot study etc.
Then:
p N σc2
2 V (y c ) = B ⇒ n =
N D + σc2
Where:
(B 2 m2 )
D=
4
when estimating µ
B2
D=
4N 2
when estimating τ
Then: n
P
ai
i=1
pb = Pn
mi
i=1
Sp2 = 1
n−1
The sample size determination for a given B is similar to the µ case.
45
10. How to draw a sample of clusters (equal probability sampling):
We have two options:
46
Under PPS we have:
n
1X yj
τbpps =
n j=1 (mj /M )
n
1X
µ
bpps = y
n j=1 j
(both τbpps and µbpps are unbiased)
n
1 X
Vb (b
µpps ) = bpps )2
(y j − µ
n(n − 1) i=1
n
m2 X
Vb (b
τpps ) = bpps )2
(y − µ
n(n − 1) i=1 j
Then:
(a) Divide up the units along the interval (0, 1) so that each pj is
proportionate to its length along the interval
(b) Generate a random number from unif [0, 1]
(c) If the random number falls into the j th interval, then uj is selected.
(d) Repeat steps (a-c) n times to draw a sample of size n with selection
probability p1 , p2 , . . . pN
13. Three Estimators for Population Total τ :
(a) SRS Estimator: for y1 , y2 , . . . , yn
n
1X
τb = N ( yj ) = N y
n j=1
Unbiased
N −n 2
τ) = N2
Vb (b S
Nn
N
(yj − y)2
P
S2 = 1
n−1
47
(b) Ratio Estimator: (SRS of clusters)
(m1 , y1 ), (m2 , y2 ), . . . , (mn , yn )
n
P
yj
1
τbR = M · n
P
mj
1
Biased
48
9 Chapter 9: Two-Stage Cluster Sampling
1. Two-stage Cluster Sampling:
Say we have a finite population PM broken into clusters c1 , c2 , . . . , cN
In the previous chapter, we take a SRS from PN by selecting some of
the clusters c1 , c2 , . . . cn .
But when the clusters are too large, we may only take a sample from
each cluster. Two-stage cluster sampling happens as:
(a) Stage I: Take a sample of clusters (select only some of the clusters;
SRS from PN )
(b) Stage II: take a SRS from each sampled cluster c1 , c2 , . . . cn
49
10 Chapter 10: Estimating the Population
Size
Here, the parameter of interest of PN is the population size N
1. There are four main methods of estimation, the first two deal mainly
with moving populations such as animals:
(a) First draw a SRS of size t from PN and tag all individuals in the
sample space before releasing them.
(b) After a period of time to ensure a thorough mixing of tagged
individuals with untagged ones, take a second SRS sample of size
n.
50
We can calculate 100(1 − α) confidence intervals for N
b as:
q
N ± 2 Vb (N
b b)
51
4. Inverse Sampling of PN :
PN = t + (N − t)
N is unknown
(b) For pre-chosen constant s > 0, draw sequentially at random with
replacement from PN until s tagged individuals are observed.
t
Let p = N
= the probability a randomly selected individual is a tagged
one.
Trial: selecting one individual at random from PN
Sampling with replacement implies that p is constant for all trials.
Let n be the number of trials needed to obtain s successes (tagged
individuals), then this implies that:
t
n ∼ NegBin(s; p = )
N
n−1 s
P (n = n) = p (1 − p)n−s
s−1
52
5. Density Estimation:
For density estimation we calculate the population total:
We estimate N as:
N b·A
b =λ
(x1 , y1 ), . . . (xn , yn )
Where:
xj is the area of the j th strip.
yj is the number of animals observed in the j th strip.
Then the estimated density for the j th strip is:
yj
xj
Where:
xj
pj =
A
This is equivalent to an equal probability sampling
53
6. Presence/Absence Survey Estimation:
Total count: n
X
yj
1
Then let: n
X
y =n− yj
1
= e−λa
54