0% found this document useful (0 votes)
35 views27 pages

Sur15 3 Sol

Uploaded by

Suhail Wani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views27 pages

Sur15 3 Sol

Uploaded by

Suhail Wani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

3 Ratio and regression estimators

3.1 Motivating examples


Frequently, we are interested in measuring the ratio of a matched pair
of variables. This occurs when the sampling unit comprises a group or
cluster of individuals, and our interest is in the population mean per
individual.
For example, to estimate average income/adult in the population in a
household survey, we record for the ith household (i = 1, · · · , n) the
number of adults who live there, xi, and the household income, yi.
Then the parameter, average income per adult in the population,
N
P
Yi
household income
R= = i=1
N
total no. of adults P
Xi
i=1

can be estimated by the ratio estimator


n
P
yi
b = r = i=1 ȳ
R n = .
P x̄
xi
i=1

Relationship between estimates


Ratio Mean Total
×X ×N
R −→ Y −→ Y

×X
R −→ Y

SydU STAT3014 (2015) Second semester Dr. J. Chan 34


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

3.2 Two characteristics per unit in SRS

Theorem: If Xi and Yi are a pair of numerical characteristics defined


on every unit of the population, and ȳ and x̄ are the corresponding
means from a SRS without replacement of size n , then
"P #
N
n 1 i=1 (Yi − Ȳ )(Xi − X̄) n  Sxy
  
Cov (x̄, ȳ) = 1 − = 1−
N n N −1 N n
(1)
and
Pn  PN
(y
i=1 i − ȳ)(x i − x̄) i=1 (Yi − Ȳ )(Xi − X̄)
E = . (2)
n−1 N −1
Proof. Consider Ui = Xi + Yi and the corresponding sample values are
ui = xi + yi. Clearly
"P #
N
 n  S2
U
 n 1

i=1 (Xi − X̄ + Yi − Ȳ )
2
Var (ū) = 1 − = 1−
N n N n N −1
"P #
N 2
PN 2
PN
 n 1 i=1 (X i − X̄) + (Y
i=1 i − Ȳ ) + 2 i=1 (X i − X̄)(Yi − Ȳ )
= 1−
N n N −1
"P #
N
2 i=1 (Xi − X̄)(Yi − Ȳ ) n

= Var (x̄) + Var (ȳ) + 1− .
n N −1 N

Since Var (ū) = Var (x̄ + ȳ) = Var (x̄) + Var (ȳ) + 2Cov(x̄, ȳ), (1) is
proved. (2) can be proved in a similar way.

Theorem: For large sample,


(a) E(r) − R ≈ 0, approximately unbiased,
"P #
N 2
1  n 1

i=1 (Yi − RXi ) 1  n  Sr2
(b) Var(r) ≈ 2 1 − = 2 1− .
X̄ N n N −1 X̄ N n

SydU STAT3014 (2015) Second semester Dr. J. Chan 35


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

Proof:
(a) Recall E(ȳ) = Ȳ , E(x̄) = X̄ and Var(x̄) = O(n−1) (order of n−1).
Thus for large sample,
 ȳ  E(ȳ)
E(r) = E ≈ = R.
x̄ X̄
(b) Note that
ȳ ȳ − Rx̄
r−R= −R≈ .
x̄ X̄
Thus, for large sample,
2 1 2 E(d¯2) Var(d)¯
Var(r) = E[(r − R) ] ≈ 2 E[(ȳ − Rx̄) ] = =
X̄ X̄ 2 X̄ 2

where d¯ = ȳ−Rx̄ is the sample mean of di = yi −Rxi, i = 1, · · · , n,


drawn from the population of Di = Yi − RXi, i = 1, · · · , N with

¯ = E(ȳ − Rx̄) = E(ȳ) − RE(x̄) = Ȳ − RX̄ = Ȳ − Ȳ


E(d) X̄ = 0.

For a SRS of di,
¯
 n  Sr2
Var(d) = 1 −
N n
where
N N
2 1 X 2 1 X
Sr = (Di − D̄) = (Yi − RXi)2.
N − 1 i=1 N − 1 i=1
Hence
" N
#
1  n 1
 1 X
2 1  n  Sr2
Var(r) ≈ 2 1 − (Yi − RXi) = 2 1 − ,
X̄ N n N − 1 i=1
X̄ N n
" n
#
1  n 1
 1 X
2 1  n  s2r
var(r) ≈ 2 1 − (yi − rxi) = 2 1 −
X̄ N n n − 1 i=1 X̄ N n

SydU STAT3014 (2015) Second semester Dr. J. Chan 36


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.
 2
1. Ordinary: x not related to y Yb̄ = ȳ & var(Yb̄ ) = 1 − n sy
N n
y6 yi s
Solid line: yi − ȳ
s
ȳ s P 2
s i (yi −ȳ)
s2y = n−1
s

- x
 2
2. Ratio: x positively related to y Yb̄ r = ȳ Xx̄ & var(Yb̄ r ) = 1 − Nn snr
y6 yi s Solid line: z = y − rx
i i i
rxi
rX s
s
s
Ȳ 2 P z 2
6 sr = n−1i i
< s2y
s y = rx = s2y − 2rρ̂sxsy + r2s2x
(a = 0,b = r)
- x
X

Calculation of s2r :
n
2 1 X
sr = (yi − rxi)2
n − 1 i=1
n
1 X ȳ
= [(yi − ȳ) − r(xi − x̄)]2 since ȳ − rx̄ = ȳ − x̄ = 0
n − 1 i=1 x̄
" n n n
#
1 X X X
= (yi − ȳ)2 − 2r (xi − x̄)(yi − ȳ) + r2 (xi − x̄)2
n − 1 i=1 i=1 i=1
= s2y − 2r sxy + r2 s2x = s2y − 2r ρ̂sxsy + r2 s2x

n n n
!
1 X X X
or s2r = yi2 − 2r xi yi + r 2 x2i .
n−1 i=1 i=1 i=1

SydU STAT3014 (2015) Second semester Dr. J. Chan 37


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

Remark:
1. If Xi and Yi are positively related, we have s2r  s2y . Hence Xi can be
used as an auxiliary variable which provides additional information
and hence improves the precision of the estimate Ȳ .
2. When X is replaced by x if it is unknown, ordinary estimator results.
3. When ratio estimation is used, estimates of variance and sample size
are quite sensitive to data points that do not fit the ideal pattern
called influential observation. It is important to plot the data and
look for these unusual data points before proceeding with an analysis.
4. The ‘ratio of means’ Rb = y is biased and can be almost unbiased
x
if n is large. Another ratio estimator is the ‘mean of ratios’
n N
∗ ∗ 1
P yi ∗ yi ∗ 1
P yi
R = r = n
b
xi where ri = xi is unbiased for R = N xi .
i=1 i=1
However Rb∗ gives equal weight to each cluster which may vary greatly
in size. Unlike Rb∗ , R
b is weighed by the cluster size which is an
advantage over Rb∗ .

SydU STAT3014 (2015) Second semester Dr. J. Chan 38


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

3.3 Ratio estimate for population mean and total


The ratio estimator of the population total Y is

Ybr = X = rX

Similarly, the ratio estimator of population mean is

Yb̄ r = X̄ = rX̄

These ratio estimates use extra information of xi, i = 1, · · · , n and


the true total and mean X or X̄, thus improving the precision of ratio
estimates over the ordinary estimates Yb = N ȳ and Yb̄ = ȳ respectively.
From the previous result,

(a) E(Yb̄ r ) = X̄E(r) ≈ X̄R = Ȳ .


Similarly E(Ybr ) = XE(r) ≈ XR = Y .
 n  Sr2 2
 n  S2
r
(b) Since Var(Y r ) ≈ 1 −
b̄ and Var(Ybr ) ≈ N 1− ,
N n N n
 n  s2r 2
 n  s2
r
var(Y r ) = 1 −
b̄ and var(Ybr ) = N 1 − .
N n N n

The estimator r for R is generally biased , so Ybr and Yb̄ r are also
biased for Y and Ȳ respectively.
Bias:
 ȳ 
Cov(r, x̄) = E(rx̄) − E(r)E(x̄) = E x̄ − E(r)E(x̄)

so
E(ȳ) Cov(r, x̄) ρr,x̄ σr σx̄
E(r) = − =R− .
E(x̄) E(x̄) X̄
SydU STAT3014 (2015) Second semester Dr. J. Chan 39
STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

Therefore for any ratio estimates,


|bias r| |R − E(r)| ρr,x̄ σx̄ σx̄
= = ≤ = cv(x̄) (3)
σr σr X̄ X̄
since |ρr,x̄| ≤ 1. Thus if the CV(x̄) is small, the bias of R
b = r is small
relative to SE(r) of R.
b But if n is small, the bias can be large.

Efficiency:
The ratio estimator is more efficient than the ordinary estimator, that is
var(Yb ) > var(Yb r ), if
cv(x)
ρ̂ > (4)
2cv(y)
where cv(y) is the sample cv for Y defined as
sy
cv(y) = .
y
Then
 n 1 2
var(Yb ) − var(Yb r ) > 0 ⇒ 1− [sy − s2r ] > 0
N n
⇒ 2
[sy − (s2y − 2rρ̂sxsy + r2s2x)] > 0
⇒ rsx(2ρ̂sy − rsx) > 0
⇒ 2ρ̂sy − rsx > 0 since r > 0 & sx > 0
y sx cv(x) y
⇒ ρ̂ > = since r =
x 2sy 2cv(y) x
cv(x)
and the equality holds when ρ̂ = .
2cv(y)

SydU STAT3014 (2015) Second semester Dr. J. Chan 40


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

Example: (7-11) The manager of 7-11 is interested in estimating the


total sale in thousands for all of its 300 branches. From last year record,
the total sale in thousands for all the 300 branches is 21300. Careful
check of this year records are obtained for a SRS of 15 branches with the
following results:
Branch Last year sale x This year sale y Branch Last year sale x This year sale y
1 50 56 9 100 165
2 35 48 10 250 409
3 12 22 11 50 73
4 10 14 12 50 70
5 15 18 13 150 95
6 30 26 14 100 55
7 9 11 15 40 83
8 25 30
n
X n
X n
X n
X n
X
xi = 926, x2i = 117400, yi = 1175, yi2 = 231815, xi yi = 155753
i=1 i=1 i=1 i=1 i=1
s2y = 9983.81

The ordinary estimate of the total sale this year in thousands is


 
1175
Yb = N y = 300 = 23500
15
with
r s
s2y

n 15 9983.81
se(Yb ) = N (1 − ) = 300 1− = 7543.72.
N n 300 15
The ratio estimate and its se for the total sale this year in thousands are
 
1175
Ybr = Xr = 21300 = 27027.54
926

SydU STAT3014 (2015) Second semester Dr. J. Chan 41


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

v !
u n n n
u n 1 1 X X X
se(Ybr ) = N t 1 − y 2 − 2r xi yi + r2 x2i
N n n − 1 i=1 i i=1 i=1
s   
15 1 1175 1175 2
= 300 1− 231815 − 2 · · 155753 + ( ) · 117400
300 15 × 14 926 926
= 3226.66

which is much smaller than se(Yb ) = 7543.72 thousands.


Read Tutorial 11 Q2a,b, Q3a,b.

3.4 Regression estimator

Since Yb r = X Rb = X y , the line y = mx with slope m = y passes


x x
through the origin (0, 0) and (X, Y r ). However, the linear relationship
b
between X and Y may not pass through the origin. A more general
estimator, the regression estimator fits a regression line:
y = A + Bx = y − Bx + Bx = y + B(x − x) (5)
to the sample data where the least square estimate of B is
PN PN
SSxy (yi − Y )(xi − X) i=1 xi yi − N XY Sxy ρSy
B= = i=1 PN = P N 2 = 2
= .
SSxx i=1 (xi − X)
2 2 S S
i=1 xi − N X x x

and A = y − Bx.
Note: Cov(X, Y ) = Sxy = SSxy /(N − 1), Var(X) = Sx2 = SSxx/(N − 1),
cov(X, Y ) = sxy = ssxy /(n − 1) and var(X) = s2x = ssxx/(n − 1).
Then the regression estimator of the population mean Y is to substitute
x = X to (5) to obtain

Yb reg = y + b(X − x)
SydU STAT3014 (2015) Second semester Dr. J. Chan 42
STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

where
Pn Pn
ssxy (y − y)(xi − x) i=1 xi yi − nxy sxy
b= Pn i
= i=1 2
= P n 2 − nx2
= 2
. (6)
ssxx i=1 (x i − x) x
i=1 i s x

Since
Yb reg = y + b(X − x) ' y + B(X − x) = z 0
the sample mean of the variable zi0 = yi + B(X − xi), we have
E(Yb reg ) ' E[y +B(X −x)] = E(y)+B[X −E(x)] = Y Approx. unbiased
and
Var(Yb reg ) ' Var(z̄ 0) = Var[y + B(X − x)] = Var(y − Bx)
= Var(ȳ) + B 2 Var(x̄) − 2B Cov(ȳ, x̄)
 n  Sy2 2
Sy
2
n  Sx2 Sy  n  ρSxSy
= 1− +ρ 2 1− − 2ρ 1−
N n Sx N n Sx N n
 n  Sy2
1 − ρ2 .

= 1−
N n
Hence
 n  s2reg  n  s2y (1 − ρ̂2)
var(Y reg ) = 1 −
b = 1−
N n N n
where s2reg is the sample variance of zi0 = yi + b(X − xi).
The regression estimator for the population total Y is
Ybreg = N [y + b(X − x)]
and its variance estimate is
 n  s2  n  s2 (1 − ρ̂2)
2 reg 2 y
var(Ybreg ) = N 1 − =N 1−
N n N n

SydU STAT3014 (2015) Second semester Dr. J. Chan 43


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

Bias:
Bias in Yb̄ reg = E(Yb̄ reg ) − Ȳ = E(ȳ) + E[b(X̄ − x̄)] − Ȳ
= E[b(X̄ − x̄)] = −Cov(b, x̄).

Efficiency:
1. The regression estimator is at least as efficient as the ordinary
estimator, that is var(Yb ) ≥ var(Yb reg ) since
 n 1 2
var(Y ) − var(Y reg ) = 1 −
b b [sy − s2reg ]
N n
 n 1 2 2
= 1− s ρ̂ ≥ 0
N n y
where the equality holds when ρ̂ = 0, i.e. there is no association
between Y and X.
2. The regression estimator is more efficient than the ratio estimator,
that is var(Yb r ) ≥ var(Yb reg ) unless
y
b=r=
x
in which case they are equivalent and the regression of y on x is
linear through the origin and the variance of y is proportional to x.
n 1 2
var(Yb r ) − var(Yb reg ) = 1− [sr − s2reg ]
N n
 n 1 2
= 1− [sy − 2rρ̂sxsy + r2s2x − s2y (1 − ρ̂2)]
N n
 n 1 2 2
= 1− (r sx − 2rρ̂sxsy + s2y ρ̂2)
N n
 n 1
= 1− (rsx − ρ̂sy )2
N n

SydU STAT3014 (2015) Second semester Dr. J. Chan 44


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.
 n 1
= 1− (rsx − bsx)2
N n
 n  s2x
= 1− (r − b)2 > 0 ⇒ (r − b)2 > 0
N n
sxy sxy sxy
where ρ̂sy = sy = = 2 sx = bsx.
sx sy sx sx

3. Since Yb reg = y + b(X − x), the regression estimator adjusts the y


up or down by an amount b(X − x).

(a) When the slope b = 0, the regression estimator Yb reg = y becomes


the ordinary estimator Yb .
(b) When the y-intercept a = y − bx = 0 ⇔ b = xy = r, the slope b
becomes the ratio estimate r and the regression estimator
y y y
Yb reg = y + (X − x) = y + X − y = X = Xr = Yb r
x x x
becomes the ratio estimator Yb .r

Example: (7-11) Estimate the total sale using the regression estimator.
Solution: The regression estimate of the total sale this year in thou-
sands is
n
X 926 1175
ssxy = xiyi − nxy = 155753 − 15 × × = 83216.33,
i=1
15 15
n  2
X 926
ssxx = x2i − nx2 = 117400 − 15 × = 60234.93,
i=1
15
n  2
X 1175
ssyy = yi2 − ny 2 = 231815 − 15 × = 139773.33.
i=1
15

SydU STAT3014 (2015) Second semester Dr. J. Chan 45


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

We have
ssxy 83216.33
b= = = 1.3815
ssxx 60234.93
and
ssxy 83216.33
ρ̂ = √ =√ = 0.9069.
ssxxssyy 60234.93 × 139773.33
It follows that
Ybreg = N [y + b(X − x)]
  
1175 21300 926
= 300 + 1.3815 − = 27340.65
15 300 15
as compared with Yb = 23500 and Ybr = 27027.54. The s.e. estimate is
r
 n  s2 (1 − ρ̂2)
y
se(Ybreg ) = N 1−
s N  n
15 9983.81(1 − 0.90692)
= 300 1− = 3178.52
300 15

which is < se(Ybr ) = 3226.66 << se(Yb ) = 7543.72. This shows that the
dropping of zero y-intercept assumption improves the estimate slightly.
Note that the y-intercept estimate is
1175 926
a = y − bx = − 1.3815 × ≈ −6.9531
15 15
which is quite close to zero.
Read Tutorial 11 Q2c,d, & 3c,d.

SydU STAT3014 (2015) Second semester Dr. J. Chan 46


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

3.5 The Hartley - Ross Estimator


Since the ratio estimator r for R is biased, the following leads to an
unbiased estimator of R.
Theorem: Let Z = f (X, Y ) be a fixed function of two variables.
Define Zi = f (Xi, Yi) and zi = f (xi, yi). Then
 Pn  PN
N − 1 i=1 zi(xi − x̄) ZiXi
E z̄ + = i=1 . (7)
N X̄ n−1 N X̄

Proof: The LHS is


Pn 
N −1 i=1 zi (xi − x̄)
E(z̄) + E
N X̄ n−1
X N XN XN XN
Zi(Xi − X̄) ZiXi − X̄ Zi Zi X i
N − 1 i=1
= Z̄ + = Z̄ + i=1 i=1
= i=1 .
N X̄ N −1 N X̄ N X̄
For the problem of estimation of R from sample (xi, yi), i = 1, · · · , n,
we assume Xi > 0, i = 1, · · · , N and define the function
zi = f (xi, yi) = yi/xi = ri∗, i = 1, · · · , n
and Zi = Yi/Xi, i = 1, 2, · · · , N , so from (7)
N
X Yi
Xi

N − 1 n(ȳ − x̄r̄ ∗
)

i=1
X i Ȳ
E r̄∗ + = = =R
N X̄ n−1 N X̄ X̄
since
n n n n
X X yi X X yi
zi(xi − x̄) = (xi − x̄) = yi − x̄ = n(ȳ − x̄r̄∗).
i=1 i=1
xi i=1 i=1
xi
Thus the Hartley-Ross estimator as an unbiased estimator of R is
SydU STAT3014 (2015) Second semester Dr. J. Chan 47
STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

∗ N − 1 n(ȳ − r̄∗x̄)
R̂hr = r̄ +
N X̄ n−1

for which we need to know X̄ (or X = N X̄). This estimator contains


a mean of ratio estimate and an adjustment for unbiasness.
The Hartley-Ross estimators for mean and total are

∗ N − 1 n(ȳ − r̄∗x̄)
for the population mean: Y hr = X̄ r̄ +
b̄ and
N n−1

∗ n(ȳ − r̄ x̄)
for the population total: Ybhr = X r̄ + (N − 1) .
n−1

Remarks:
n
ȳ ∗ 1 X yi
1. So far, we have R = biased for R, R =
b b biased for R &
x̄ n i=1 xi
N
1 X yi
unbiased for R∗ = and Rbhr unbiased for R. Finally, could
N i=1 xi
we just use
R bo = ȳ/X̄ ?

 ȳ 
E(ȳ) Ȳ
This is the ordinary estimator E == = R which does
X̄ X̄ X̄
not use the information from the sample {xi} but is unbiased for R.

2. For small samples we might expect the Hartley-Ross estimator to be


better. There is no general result on the comparison of the variances
ȳ ȳ ∗ N − 1 n(ȳ − r̄∗x̄)
of r = , ro = , and rhr = r̄ + for all
x̄ X̄ N X̄ n−1
sample sizes.
See Cochran (2nd Ed) Theorem 6.3 §6.15.

SydU STAT3014 (2015) Second semester Dr. J. Chan 48


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

Summary of estimators and variance estimates based on 1 SRS

Ord. Ratio Regression Hartley-Ross

ȳ ȳ ∗ N − 1 n(ȳ − r̄∗ x̄)


Ratio R - r̄ +
X̄ x̄ N X̄ n−1
1 n s2y 1 n s2r
(1 − ) (1 − ) - -
X̄ 2 N n X̄ 2 N n

ȳ sxy N − 1 n(ȳ − r̄∗ x̄)


Mean Ȳ ȳ X̄ y+ (X − x) X̄ r̄∗ +
x̄ s2x N n−1

n s2y n s2 n s2y (1 − ρ̂2 )


(1 − ) (1 − ) r (1 − ) -
N n N n N n
var(Yb̄ r ) < var(Yb̄ ) var(Yb̄ reg ) < var(Yb̄ r )
ȳsx ȳ
if ρ̂ > equal if b = r =
2x̄sy x̄

ȳ sxy ∗ n(ȳ − r̄∗ x̄)


Total Y N ȳ X N y + 2 (X − N x) X r̄ + (N − 1)
x̄ sx n−1

2 n s2y 2 n s2r 2 n s2y (1 − ρ̂2 )


N (1 − ) N (1 − ) N (1 − ) -
N n N n N n

n
1
( yi2 − nȳ 2 ),
X
s2y=
n − 1 i=1
n n n
1 ȳ
( x2i ) = s2y − 2rρ̂sx sy + r2 s2x , r = ,
X X X
2 2 2
sr = yi − 2r x i yi + r
n − 1 i=1 i=1 i=1

SydU STAT3014 (2015) Second semester Dr. J. Chan 49


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

Example: (7-11)
Solution: The ratios and their summary are given below:
i xi yi ri0 = yi/xi i xi yi ri0 = yi/xi
1 50 56 1.120 9 100 165 1.650
2 35 48 1.371 10 250 409 1.636
3 12 22 1.833 11 50 73 1.460
4 10 14 1.400 12 50 70 1.400
5 15 18 1.200 13 150 95 0.633
6 30 26 0.867 14 100 55 0.550
7 9 11 1.222 15 40 83 2.075
8 25 30 1.200 Total 19.618
n
1X ∗
∗ 19.618
We have r̄ = ri = = 1.3079, x̄ = 61.7333 and ȳ =
n i=1 15
78.3333.

The Hartley-Ross estimate of the total sale this year in thousands is



n(ȳ − r̄ x̄)
Ybhr = X r̄∗ + (N − 1)
n−1
15[78.333 − 1.3079(61.7333)]
= 21300(1.3079) + (300 − 1)
15 − 1
= 27086.9

Read Tutorial 12 Q1(a).

SydU STAT3014 (2015) Second semester Dr. J. Chan 50


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

Example: In a survey of family size (x1), weekly income (x2) and weekly
expenditure on food (y), we want to estimate the average weekly expen-
diture on food per family in the most efficient way. A simple random
sample of 27 families yields the following data:
X X X
x1i = 109, x2i = 16277, yi = 2831, ρ̂x1,y = 0.925, ρ̂x2,y = 0.573
i i i

The sample covariance matrix for y, x1 and x2 is


 
547.8234 26.5057 1796.5541
 
 
 26.5057 1.4986 80.1595  .
 
 
 
1796.5541 80.1595 17967.0541

From the census data X̄1 = 3.91 and X̄2 = 542.

(a) Estimate the standard errors of the ratio estimators for Ȳ using x1
and using x2. Compare the standard errors with the s.e. for the
simple estimate ignoring the covariates. Which estimator has the
smallest estimated s.e.?
(b) Calculate the best available estimate of the average weekly expen-
diture on food per family and give an approximate 95% confidence
interval for this average.

Solution:
(a) The standard errors of the ratio estimators for Ȳ using x1 and using
x2 are

SydU STAT3014 (2015) Second semester Dr. J. Chan 51


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

P
yi 2831
r1 = P i = = 25.97
i x 1i 109
2 2
sr1 = sy − 2r sx1y + r2 s2x1
= 547.8234 − 2(25.97)(26.5057) + 25.972(1.4986)
= 181.896
P
yi 2831
r2 = P i = = 0.1739
x
i 2i 16277
sr2 = sy − 2r sx2y + r2 s2x2
2 2

= 547.8234 − 2(0.1739)(1796.5541) + 0.17392(17967.0541)


= r
475.7895 r
s2r1 181.896
se(Y r1) =
b̄ = = 2.5956
r n r 27
2
sr2 475.7895.193
se(Yb̄ r2) = = = 4.1978
r n 27
r
s 2
y 547.8234
se(Yb̄ ) = = = 4.5044
n 27
The first ratio estimator Yb̄ has the lowest s.e. due to the higher cor-
r1
relation ρ̂y,x1 = 0.925. The second ratio estimator only has marginal
improvement as the correlation ρ̂y,xx = 0.573 is weak but

ȳsx 2831 · 17967.0541
ρx2,y = 0.573 > = √ = 0.4980.
2x̄sy 2 · 16277 · 547.8234
Note that fpc is ignored because the population size N is unknown.
(b) The estimate of the average weekly expenditure on food per family
Yb̄ = r X̄ = 25.97(3.91) = 101.5524
r1 1
95% CI for Ȳ = 101.5524 ∓ 1.96(2.5956) = (96.4651, 106.6397)

SydU STAT3014 (2015) Second semester Dr. J. Chan 52


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

3.6 Ratio estimate for subpopulation in poststratification


For some Cl , we want to estimate:
, N
, N
X X X X
0
Rl = Yi Xi = Yi Xi0, = R0
i∈Cl i∈Cl i=1 i=1

if we define
(Yi0, Xi0) = (Yi, Xi) if i ∈ Cl
= (0, 0) if i ∈
/ Cl .
Note: X 0 = Xl , i.e. the sum of Xi0 over all population equals to the
sum of Xi over Cl . Hence the natural estimator of ratio and its variance
estimate is
n P
P 0
yi yi
0 i=1 i∈Cj 1  n  s0rl 2
r =P n = P = rl and var(rl ) ≈ 0 )2
1−
x ( X̄ N n
x0i i∈C i
l
i=1

where
(x0i, yi0 ) = (xi, yi) if i ∈ Cl
= (0, 0) if i ∈
/ Cl ,
n
0 X0 0 1X 0 1X
X̄ = can be estimated by x̄ = x = xi and
N n i=1 i n
i∈Cl
N
2 1 X 0
Srl0 = (Yi − R0Xi0)2
N − 1 i−1
can be estimated by
n
2 1 X 0 1 X
s0rl = 0 0 2
(yi − r xi) = (yi − rl xi)2.
n − 1 i=1 n−1
i∈Cl

SydU STAT3014 (2015) Second semester Dr. J. Chan 53


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

The ratio estimator of mean in Cl and its variance estimate are


1  n  s0 2
rl
Yb̄ rl = X̄l rl and var(Yb̄ rl ) ≈ 2 1 −
Wl N n
since
2 X̄l2  n  s0rl 2
var(Y rl ) = X̄l var(rl ) = 02 1 −

X̄ N n
02 0 2
X N 2 
n srl
 1  n  s0rl 2
= 1− = 2 1− .
Nl2 X 02 N n Wl N n

Similarly, the ratio estimator of total in Cl and its variance estimate are
 n  s0 2
2 rl
Ybrl = Xl rl and var(Ybrl ) ≈ N 1 −
N n
since
X 2 
n  s0 2 0 N 2 
n  s0 2
var(Ybrl ) = Xl2var(rl ) = 0l2 1 − rl
= X 2 02 1 − rl
.
X̄ N n X N n
Note that these estimators correspond to method 1 in Section 1.5 for
poststratification and nl does not come into any of these calculations.
Read Tutorial 12 Q1b,c.

SydU STAT3014 (2015) Second semester Dr. J. Chan 54


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

3.7 Ratio Estimation for Stratified SRS


In a stratified SRS, a SRS of a specified sample size nl is taken in each of
the L strata with known size Nl , e.g. the 6 states of Australia. There
are two types of ratio estimates depending on the order of taking ratio
and summing over strata.

1. Take ratios rl = ȳl /x̄l first and sum over Ybl = Xl rl to obtain R
bs =
PL b
l=1 Yl /X.

2. Sum over Ybl and X


bl first to obtain Yb and X
b and then take ratio
R
bc = Yb /X.
b

3.7.1 The ‘Separate’ Ratio Estimate


L
P
Suppose the stratum totals Xl , l = 1, · · · , L are known so X = Xl
l=1
is known also. Then
L L L
Y
b 1 X 1 X 1 X
R
bs = = Ybl = Xl rl = Wl X̄l rl
X X X X̄
l=1 l=1 l=1

Xl N Nl Xl 1 ȳl
since = = Wl X̄l and rl = . Then
X X N Nl X̄ x̄l
L L L
X Xl X Xl Yl 1 X Y
E(R
bs ) = E(rl ) ≈ = Yl = = R,
X X Xl X X
l=1 l=1 l=1

L   2
Yl 1 X
2 n l ssrl
since E(rl ) ≈ Rl = and var(R
bs ) = W 1 −
Xl X̄ 2 l=1 l Nl nl

SydU STAT3014 (2015) Second semester Dr. J. Chan 55


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

where
nl nl nl
" #
1 X X X
s2rl = s2yl −2rl sxl yl +rl2s2xl = yil2 − 2rl xil yil + rl2 x2il .
nl − 1 i=1 i=1 i=1
Similarly the separate ratio estimate for the mean is
L L
s2srl
 
X X nl
Yb̄ st,s = Wl X̄l rl and var(Yb̄ st,s) = Wl2 1−
Nl nl
l=1 l=1

Bias:
For large stratum sample sizes, rl will be approximately unbiased for Rl
and var(rl ) will approximate Var(rl ) reasonably well.
For moderate and small samples, bias is important, and we should con-
sider it here. We know that in a single stratum
|bias rl | σx̄l
≤ = cv(x̄l )
σrl X̄l
Consider the bias of R
bs :
|bias (R
bs)| = E(R bs − R)
L
! L
X Xl X Xl
= E (rl − Rl ) = E(rl − Rl )
X X
l=1 l=1
L L
X Xl X Xl
= |bias rl | ≤ max |bias rl |
X l X
l=1 l=1
 
σx̄l σrl
≤ max |bias rl | ≤ max
l l X̄l
Hence  
|bias (R max σrl σx̄l
 
√ max σrl  
 max σx̄l
bs)|
≤ l max ≤ L l
s.e.Rbs bs ) l
s.e.(R X̄l min σrl l X̄l
l

SydU STAT3014 (2015) Second semester Dr. J. Chan 56


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

since
v v
u L  2 u L
uX Xl uX
s.e.(Rs) =
b t var(rl ) ≥ min σrl t p2l
X l
l=1 l=1
v
u L  2 r
uX 1 L 1
≥ min σrl t ≥ min σrl ≥ √ min σrl
l L l L2 L l
l=1

where Xl = pl X. The sum of squares of unequal proportions is higher


than that from equal proportions in general. This is due to the convexity
property of the function f (p) = p2. For example, when L = 2 with cases
(1 − p, p) and ( 12 , 12 ),
1 1 1
(1 − p)2 + p2 − 2( )2 = 2p2 − 2p + = (2p − 1)2 ≥ 0.
2 2 2

Therefore the ratio on the LHS can be L times as large as the σx̄l /X̄l
bound on individual relative biases. Even if the biases are individually
small, the overall bias can be large.

3.7.2 The ‘Combined’ Ratio Estimate


It is defined as
L
P
Wl ȳl
l=1 ȳst Yb̄ Yb
R
bc =
L
= = =
P x̄st X b̄ Xb
Wl x̄l
l=1

SydU STAT3014 (2015) Second semester Dr. J. Chan 57


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

and in contrast to Rbs, it does not require the knowledge of individual


Xl ’s. Note that
  L
ȳ st
 ȳ 
st 1 X Y
E(R bc ) = E ≈E ≈ Wl E(ȳl ) = = R Approx. unbiased
x̄st X̄ X̄ X
l=1

Theorem:
L PNl !
2
 
bc ) ≈ 1 nl 1 i=1 [Yil − Ȳl − R(Xil − X̄l )]
X
2
Var(R W 1 − .
X̄ 2 l=1 l Nl nl Nl − 1
Proof: First
L
ȳst 1 1 X
Rc − R =
b −R= (ȳst − Rx̄st) = Wl (ȳl − Rx̄l )
x̄st x̄st x̄st
l=1
L
1 X 1 ¯ 1
= Wl d¯l = dst ≈ d¯st
x̄st x̄st X̄
l=1
where dli = yli − Rxli, i = 1, · · · , nl estimates Dli = Yli − RXli and
nl
1 X
d¯l = dil . Note that typically D̄l 6= 0. Hence
nl i=1
L   2
bc ) ≈ 1 ¯st) ≈ 1 X
2 nl Scrl
Var(R Var(d W l 1 −
X̄ 2 X̄ 2 l=1
Nl nl
where
lN l N
2 1 X 1 X
Scrl = (Dli − D̄l )2 = [Yli − Ȳl − R(Xli − X̄l )]2
Nl − 1 i=1 Nl − 1 i=1
= Sy2l − 2RSxl yl + R2Sx2l ,
and this can be estimated by
nl
1 X
s2crl = [yli − ȳl − rc(xli − x̄l )]2 = s2yl − 2rcsxl yl + rc2s2xl
nl − 1 i=1
SydU STAT3014 (2015) Second semester Dr. J. Chan 58
STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

as compared to
" n nl nl
#
l
2 1 X
2
X
2
X
ssrl = yli − 2rl xliyli + rl x2il = s2yl −2rl sxl yl +rl2s2xl
nl − 1 i=1 i=1 i=1

for separate ratio estimator.


There is less risk of bias in R̂c than in R̂s. We can show that
 
|E(R bc − R)| σx̄l
≤ max
s.e.R
bc l X̄l
in contrast to
|E(Rbs − R)| √  maxl σr   
σx̄l
l
≤ L max
s.e.R
bs minl σrl l X̄l

for the separate ratio estimator R


bs .

Similarly the combine ratio estimate for the mean and its variance are
L   2
X nl scrl
Yb̄ st,c = X̄ R
bc and var(Yb̄ st,c) = Wl2 1 − .
Nl nl
l=1

and for the total are


L   2
X nl scrl
Ybst,c = X R
bc and var(Ybst,c) = N 2 Wl2 1 − .
Nl nl
l=1

Read Tutorial 12 Q2.

SydU STAT3014 (2015) Second semester Dr. J. Chan 59


STAT3014/3914 Applied Stat.-Sampling C3-Ratio & reg est.

Estimators and variance estimates for stratified SRS (Ch.2)


Parameter Estimator Variance
nl
Ordinary/naive estimator syl = nl −1 ( yli2 − nl ȳl2 ), Wl = NNl
2 1
P
i=1
L L  2
syl

1 X 1 X
2 n l
Ratio R Rbst = Wl ȳl var(Rbst ) = W 1 −
X̄ l=1 X̄ 2 l=1 l Nl nl
L  2
syl
L

P X
2 nl
Mean Ȳ Yb̄ st = Wl ȳl var(Yb̄ st ) = Wl 1 −
l=1 Nl nl
l=1
L  2
syl
L

X n l
var(Ybst ) = N 2 Wl2 1 −
P
Total Y Ybst = N Wl ȳl
l=1 Nl nl
l=1
ȳl
Separate ratio estimator s2sr,l = s2yl − 2 rl ρ̂sxl syl + rl2 s2xl , rl =
x̄l
L L  2
ssr,l

1 X 1 X
2 n l
Ratio R Rbst,sr = Wl X̄l rl var(R bst,sr ) = W 1 −
X̄ l=1 X̄ 2 l=1 l Nl nl
L  2
ssr,l
L

X nl
Wl2 1 −
P
Mean Ȳ Yb̄ st,sr = Wl X̄l rl var(Yb̄ st,sr ) =
l=1 Nl nl
l=1
L  2
ssr,l
L

X n l
var(Ȳst,sr ) = N 2 Wl2 1 −
P
Total Y Ybst,sr = N Wl X̄l rl
l=1 Nl nl
l=1 PL
l=1 Wl ȳl
Combine ratio estimator s2cr,l = s2yl − 2 rc ρ̂sxl syl + rc2 s2xl , rst,cr = PL
l=1 Wl x̄l
L
P
Wl ȳl L  2
scr,l

l=1 1 X
2 n l
Ratio R R
bst,cr = = rst,cr var(R
bst,cr ) = W l 1 −
PL X̄ 2 l=1 Nl nl
Wl x̄l
l=1
L
nl s2cr,l
X  
Mean Ȳ Yb̄ st,cr = X̄rst,cr var(Yb̄ st,cr ) = 1−Wl2
Nl nl
l=1
L
nl s2cr,l
X  
2 2
Total Y Ybst,cr = N X̄rst,cr var(Ȳst,cr ) = N Wl 1 −
Nl nl
l=1

SydU STAT3014 (2015) Second semester Dr. J. Chan 60

You might also like