Sur15 3 Sol
Sur15 3 Sol
×X
R −→ Y
Since Var (ū) = Var (x̄ + ȳ) = Var (x̄) + Var (ȳ) + 2Cov(x̄, ȳ), (1) is
proved. (2) can be proved in a similar way.
Proof:
(a) Recall E(ȳ) = Ȳ , E(x̄) = X̄ and Var(x̄) = O(n−1) (order of n−1).
Thus for large sample,
ȳ E(ȳ)
E(r) = E ≈ = R.
x̄ X̄
(b) Note that
ȳ ȳ − Rx̄
r−R= −R≈ .
x̄ X̄
Thus, for large sample,
2 1 2 E(d¯2) Var(d)¯
Var(r) = E[(r − R) ] ≈ 2 E[(ȳ − Rx̄) ] = =
X̄ X̄ 2 X̄ 2
- x
2
2. Ratio: x positively related to y Yb̄ r = ȳ Xx̄ & var(Yb̄ r ) = 1 − Nn snr
y6 yi s Solid line: z = y − rx
i i i
rxi
rX s
s
s
Ȳ 2 P z 2
6 sr = n−1i i
< s2y
s y = rx = s2y − 2rρ̂sxsy + r2s2x
(a = 0,b = r)
- x
X
Calculation of s2r :
n
2 1 X
sr = (yi − rxi)2
n − 1 i=1
n
1 X ȳ
= [(yi − ȳ) − r(xi − x̄)]2 since ȳ − rx̄ = ȳ − x̄ = 0
n − 1 i=1 x̄
" n n n
#
1 X X X
= (yi − ȳ)2 − 2r (xi − x̄)(yi − ȳ) + r2 (xi − x̄)2
n − 1 i=1 i=1 i=1
= s2y − 2r sxy + r2 s2x = s2y − 2r ρ̂sxsy + r2 s2x
n n n
!
1 X X X
or s2r = yi2 − 2r xi yi + r 2 x2i .
n−1 i=1 i=1 i=1
Remark:
1. If Xi and Yi are positively related, we have s2r s2y . Hence Xi can be
used as an auxiliary variable which provides additional information
and hence improves the precision of the estimate Ȳ .
2. When X is replaced by x if it is unknown, ordinary estimator results.
3. When ratio estimation is used, estimates of variance and sample size
are quite sensitive to data points that do not fit the ideal pattern
called influential observation. It is important to plot the data and
look for these unusual data points before proceeding with an analysis.
4. The ‘ratio of means’ Rb = y is biased and can be almost unbiased
x
if n is large. Another ratio estimator is the ‘mean of ratios’
n N
∗ ∗ 1
P yi ∗ yi ∗ 1
P yi
R = r = n
b
xi where ri = xi is unbiased for R = N xi .
i=1 i=1
However Rb∗ gives equal weight to each cluster which may vary greatly
in size. Unlike Rb∗ , R
b is weighed by the cluster size which is an
advantage over Rb∗ .
The estimator r for R is generally biased , so Ybr and Yb̄ r are also
biased for Y and Ȳ respectively.
Bias:
ȳ
Cov(r, x̄) = E(rx̄) − E(r)E(x̄) = E x̄ − E(r)E(x̄)
x̄
so
E(ȳ) Cov(r, x̄) ρr,x̄ σr σx̄
E(r) = − =R− .
E(x̄) E(x̄) X̄
SydU STAT3014 (2015) Second semester Dr. J. Chan 39
STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.
Efficiency:
The ratio estimator is more efficient than the ordinary estimator, that is
var(Yb ) > var(Yb r ), if
cv(x)
ρ̂ > (4)
2cv(y)
where cv(y) is the sample cv for Y defined as
sy
cv(y) = .
y
Then
n 1 2
var(Yb ) − var(Yb r ) > 0 ⇒ 1− [sy − s2r ] > 0
N n
⇒ 2
[sy − (s2y − 2rρ̂sxsy + r2s2x)] > 0
⇒ rsx(2ρ̂sy − rsx) > 0
⇒ 2ρ̂sy − rsx > 0 since r > 0 & sx > 0
y sx cv(x) y
⇒ ρ̂ > = since r =
x 2sy 2cv(y) x
cv(x)
and the equality holds when ρ̂ = .
2cv(y)
v !
u n n n
u n 1 1 X X X
se(Ybr ) = N t 1 − y 2 − 2r xi yi + r2 x2i
N n n − 1 i=1 i i=1 i=1
s
15 1 1175 1175 2
= 300 1− 231815 − 2 · · 155753 + ( ) · 117400
300 15 × 14 926 926
= 3226.66
and A = y − Bx.
Note: Cov(X, Y ) = Sxy = SSxy /(N − 1), Var(X) = Sx2 = SSxx/(N − 1),
cov(X, Y ) = sxy = ssxy /(n − 1) and var(X) = s2x = ssxx/(n − 1).
Then the regression estimator of the population mean Y is to substitute
x = X to (5) to obtain
Yb reg = y + b(X − x)
SydU STAT3014 (2015) Second semester Dr. J. Chan 42
STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.
where
Pn Pn
ssxy (y − y)(xi − x) i=1 xi yi − nxy sxy
b= Pn i
= i=1 2
= P n 2 − nx2
= 2
. (6)
ssxx i=1 (x i − x) x
i=1 i s x
Since
Yb reg = y + b(X − x) ' y + B(X − x) = z 0
the sample mean of the variable zi0 = yi + B(X − xi), we have
E(Yb reg ) ' E[y +B(X −x)] = E(y)+B[X −E(x)] = Y Approx. unbiased
and
Var(Yb reg ) ' Var(z̄ 0) = Var[y + B(X − x)] = Var(y − Bx)
= Var(ȳ) + B 2 Var(x̄) − 2B Cov(ȳ, x̄)
n Sy2 2
Sy
2
n Sx2 Sy n ρSxSy
= 1− +ρ 2 1− − 2ρ 1−
N n Sx N n Sx N n
n Sy2
1 − ρ2 .
= 1−
N n
Hence
n s2reg n s2y (1 − ρ̂2)
var(Y reg ) = 1 −
b = 1−
N n N n
where s2reg is the sample variance of zi0 = yi + b(X − xi).
The regression estimator for the population total Y is
Ybreg = N [y + b(X − x)]
and its variance estimate is
n s2 n s2 (1 − ρ̂2)
2 reg 2 y
var(Ybreg ) = N 1 − =N 1−
N n N n
Bias:
Bias in Yb̄ reg = E(Yb̄ reg ) − Ȳ = E(ȳ) + E[b(X̄ − x̄)] − Ȳ
= E[b(X̄ − x̄)] = −Cov(b, x̄).
Efficiency:
1. The regression estimator is at least as efficient as the ordinary
estimator, that is var(Yb ) ≥ var(Yb reg ) since
n 1 2
var(Y ) − var(Y reg ) = 1 −
b b [sy − s2reg ]
N n
n 1 2 2
= 1− s ρ̂ ≥ 0
N n y
where the equality holds when ρ̂ = 0, i.e. there is no association
between Y and X.
2. The regression estimator is more efficient than the ratio estimator,
that is var(Yb r ) ≥ var(Yb reg ) unless
y
b=r=
x
in which case they are equivalent and the regression of y on x is
linear through the origin and the variance of y is proportional to x.
n 1 2
var(Yb r ) − var(Yb reg ) = 1− [sr − s2reg ]
N n
n 1 2
= 1− [sy − 2rρ̂sxsy + r2s2x − s2y (1 − ρ̂2)]
N n
n 1 2 2
= 1− (r sx − 2rρ̂sxsy + s2y ρ̂2)
N n
n 1
= 1− (rsx − ρ̂sy )2
N n
Example: (7-11) Estimate the total sale using the regression estimator.
Solution: The regression estimate of the total sale this year in thou-
sands is
n
X 926 1175
ssxy = xiyi − nxy = 155753 − 15 × × = 83216.33,
i=1
15 15
n 2
X 926
ssxx = x2i − nx2 = 117400 − 15 × = 60234.93,
i=1
15
n 2
X 1175
ssyy = yi2 − ny 2 = 231815 − 15 × = 139773.33.
i=1
15
We have
ssxy 83216.33
b= = = 1.3815
ssxx 60234.93
and
ssxy 83216.33
ρ̂ = √ =√ = 0.9069.
ssxxssyy 60234.93 × 139773.33
It follows that
Ybreg = N [y + b(X − x)]
1175 21300 926
= 300 + 1.3815 − = 27340.65
15 300 15
as compared with Yb = 23500 and Ybr = 27027.54. The s.e. estimate is
r
n s2 (1 − ρ̂2)
y
se(Ybreg ) = N 1−
s N n
15 9983.81(1 − 0.90692)
= 300 1− = 3178.52
300 15
which is < se(Ybr ) = 3226.66 << se(Yb ) = 7543.72. This shows that the
dropping of zero y-intercept assumption improves the estimate slightly.
Note that the y-intercept estimate is
1175 926
a = y − bx = − 1.3815 × ≈ −6.9531
15 15
which is quite close to zero.
Read Tutorial 11 Q2c,d, & 3c,d.
∗ N − 1 n(ȳ − r̄∗x̄)
R̂hr = r̄ +
N X̄ n−1
∗ N − 1 n(ȳ − r̄∗x̄)
for the population mean: Y hr = X̄ r̄ +
b̄ and
N n−1
∗
∗ n(ȳ − r̄ x̄)
for the population total: Ybhr = X r̄ + (N − 1) .
n−1
Remarks:
n
ȳ ∗ 1 X yi
1. So far, we have R = biased for R, R =
b b biased for R &
x̄ n i=1 xi
N
1 X yi
unbiased for R∗ = and Rbhr unbiased for R. Finally, could
N i=1 xi
we just use
R bo = ȳ/X̄ ?
ȳ
E(ȳ) Ȳ
This is the ordinary estimator E == = R which does
X̄ X̄ X̄
not use the information from the sample {xi} but is unbiased for R.
n
1
( yi2 − nȳ 2 ),
X
s2y=
n − 1 i=1
n n n
1 ȳ
( x2i ) = s2y − 2rρ̂sx sy + r2 s2x , r = ,
X X X
2 2 2
sr = yi − 2r x i yi + r
n − 1 i=1 i=1 i=1
x̄
Example: (7-11)
Solution: The ratios and their summary are given below:
i xi yi ri0 = yi/xi i xi yi ri0 = yi/xi
1 50 56 1.120 9 100 165 1.650
2 35 48 1.371 10 250 409 1.636
3 12 22 1.833 11 50 73 1.460
4 10 14 1.400 12 50 70 1.400
5 15 18 1.200 13 150 95 0.633
6 30 26 0.867 14 100 55 0.550
7 9 11 1.222 15 40 83 2.075
8 25 30 1.200 Total 19.618
n
1X ∗
∗ 19.618
We have r̄ = ri = = 1.3079, x̄ = 61.7333 and ȳ =
n i=1 15
78.3333.
Example: In a survey of family size (x1), weekly income (x2) and weekly
expenditure on food (y), we want to estimate the average weekly expen-
diture on food per family in the most efficient way. A simple random
sample of 27 families yields the following data:
X X X
x1i = 109, x2i = 16277, yi = 2831, ρ̂x1,y = 0.925, ρ̂x2,y = 0.573
i i i
(a) Estimate the standard errors of the ratio estimators for Ȳ using x1
and using x2. Compare the standard errors with the s.e. for the
simple estimate ignoring the covariates. Which estimator has the
smallest estimated s.e.?
(b) Calculate the best available estimate of the average weekly expen-
diture on food per family and give an approximate 95% confidence
interval for this average.
Solution:
(a) The standard errors of the ratio estimators for Ȳ using x1 and using
x2 are
P
yi 2831
r1 = P i = = 25.97
i x 1i 109
2 2
sr1 = sy − 2r sx1y + r2 s2x1
= 547.8234 − 2(25.97)(26.5057) + 25.972(1.4986)
= 181.896
P
yi 2831
r2 = P i = = 0.1739
x
i 2i 16277
sr2 = sy − 2r sx2y + r2 s2x2
2 2
if we define
(Yi0, Xi0) = (Yi, Xi) if i ∈ Cl
= (0, 0) if i ∈
/ Cl .
Note: X 0 = Xl , i.e. the sum of Xi0 over all population equals to the
sum of Xi over Cl . Hence the natural estimator of ratio and its variance
estimate is
n P
P 0
yi yi
0 i=1 i∈Cj 1 n s0rl 2
r =P n = P = rl and var(rl ) ≈ 0 )2
1−
x ( X̄ N n
x0i i∈C i
l
i=1
where
(x0i, yi0 ) = (xi, yi) if i ∈ Cl
= (0, 0) if i ∈
/ Cl ,
n
0 X0 0 1X 0 1X
X̄ = can be estimated by x̄ = x = xi and
N n i=1 i n
i∈Cl
N
2 1 X 0
Srl0 = (Yi − R0Xi0)2
N − 1 i−1
can be estimated by
n
2 1 X 0 1 X
s0rl = 0 0 2
(yi − r xi) = (yi − rl xi)2.
n − 1 i=1 n−1
i∈Cl
Similarly, the ratio estimator of total in Cl and its variance estimate are
n s0 2
2 rl
Ybrl = Xl rl and var(Ybrl ) ≈ N 1 −
N n
since
X 2
n s0 2 0 N 2
n s0 2
var(Ybrl ) = Xl2var(rl ) = 0l2 1 − rl
= X 2 02 1 − rl
.
X̄ N n X N n
Note that these estimators correspond to method 1 in Section 1.5 for
poststratification and nl does not come into any of these calculations.
Read Tutorial 12 Q1b,c.
1. Take ratios rl = ȳl /x̄l first and sum over Ybl = Xl rl to obtain R
bs =
PL b
l=1 Yl /X.
Xl N Nl Xl 1 ȳl
since = = Wl X̄l and rl = . Then
X X N Nl X̄ x̄l
L L L
X Xl X Xl Yl 1 X Y
E(R
bs ) = E(rl ) ≈ = Yl = = R,
X X Xl X X
l=1 l=1 l=1
L 2
Yl 1 X
2 n l ssrl
since E(rl ) ≈ Rl = and var(R
bs ) = W 1 −
Xl X̄ 2 l=1 l Nl nl
where
nl nl nl
" #
1 X X X
s2rl = s2yl −2rl sxl yl +rl2s2xl = yil2 − 2rl xil yil + rl2 x2il .
nl − 1 i=1 i=1 i=1
Similarly the separate ratio estimate for the mean is
L L
s2srl
X X nl
Yb̄ st,s = Wl X̄l rl and var(Yb̄ st,s) = Wl2 1−
Nl nl
l=1 l=1
Bias:
For large stratum sample sizes, rl will be approximately unbiased for Rl
and var(rl ) will approximate Var(rl ) reasonably well.
For moderate and small samples, bias is important, and we should con-
sider it here. We know that in a single stratum
|bias rl | σx̄l
≤ = cv(x̄l )
σrl X̄l
Consider the bias of R
bs :
|bias (R
bs)| = E(R bs − R)
L
! L
X Xl X Xl
= E (rl − Rl ) = E(rl − Rl )
X X
l=1 l=1
L L
X Xl X Xl
= |bias rl | ≤ max |bias rl |
X l X
l=1 l=1
σx̄l σrl
≤ max |bias rl | ≤ max
l l X̄l
Hence
|bias (R max σrl σx̄l
√ max σrl
max σx̄l
bs)|
≤ l max ≤ L l
s.e.Rbs bs ) l
s.e.(R X̄l min σrl l X̄l
l
since
v v
u L 2 u L
uX Xl uX
s.e.(Rs) =
b t var(rl ) ≥ min σrl t p2l
X l
l=1 l=1
v
u L 2 r
uX 1 L 1
≥ min σrl t ≥ min σrl ≥ √ min σrl
l L l L2 L l
l=1
Theorem:
L PNl !
2
bc ) ≈ 1 nl 1 i=1 [Yil − Ȳl − R(Xil − X̄l )]
X
2
Var(R W 1 − .
X̄ 2 l=1 l Nl nl Nl − 1
Proof: First
L
ȳst 1 1 X
Rc − R =
b −R= (ȳst − Rx̄st) = Wl (ȳl − Rx̄l )
x̄st x̄st x̄st
l=1
L
1 X 1 ¯ 1
= Wl d¯l = dst ≈ d¯st
x̄st x̄st X̄
l=1
where dli = yli − Rxli, i = 1, · · · , nl estimates Dli = Yli − RXli and
nl
1 X
d¯l = dil . Note that typically D̄l 6= 0. Hence
nl i=1
L 2
bc ) ≈ 1 ¯st) ≈ 1 X
2 nl Scrl
Var(R Var(d W l 1 −
X̄ 2 X̄ 2 l=1
Nl nl
where
lN l N
2 1 X 1 X
Scrl = (Dli − D̄l )2 = [Yli − Ȳl − R(Xli − X̄l )]2
Nl − 1 i=1 Nl − 1 i=1
= Sy2l − 2RSxl yl + R2Sx2l ,
and this can be estimated by
nl
1 X
s2crl = [yli − ȳl − rc(xli − x̄l )]2 = s2yl − 2rcsxl yl + rc2s2xl
nl − 1 i=1
SydU STAT3014 (2015) Second semester Dr. J. Chan 58
STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.
as compared to
" n nl nl
#
l
2 1 X
2
X
2
X
ssrl = yli − 2rl xliyli + rl x2il = s2yl −2rl sxl yl +rl2s2xl
nl − 1 i=1 i=1 i=1
Similarly the combine ratio estimate for the mean and its variance are
L 2
X nl scrl
Yb̄ st,c = X̄ R
bc and var(Yb̄ st,c) = Wl2 1 − .
Nl nl
l=1