0% found this document useful (0 votes)
23 views

Sampling Design and Analysis - (APPENDIX A Probability Concepts Used in Sampling)

Uploaded by

Elaine Kong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Sampling Design and Analysis - (APPENDIX A Probability Concepts Used in Sampling)

Uploaded by

Elaine Kong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Appendix A: Probability Concepts

Used in Sampling

I recollect nothing that passed that day, except Johnson’s quickness, who, when Dr. Beattie observed,
as something remarkable which had happened to him, that he had chanced to see both No. 1, and
No. 1000, of the hackney-coaches, the first and the last; “Why, Sir, (said Johnson,) there is an equal
chance for one’s seeing those two numbers as any other two." He was clearly right; yet the seeing of
the two extremes, each of which is in some degree more conspicuous than the rest, could not but strike
one in a stronger manner than the sight of any other two numbers.’’

—James Boswell, The Life of Samuel Johnson

The essence of probability sampling is that we can calculate the probability with which
any subset of observations in the population will be selected as the sample. Most of
the randomization theory results used in this book depend on probability concepts
for their proof. In this appendix we present a brief review of some of the basic ideas
used. The reader should consult a more comprehensive reference on probability, such
as Ross (2006) or Durrett (1994), for more detail and for derivations and proofs.
Because all work in randomization theory concerns discrete random variables,
Copyright © 2019. CRC Press LLC. All rights reserved.

only results for discrete random variables are given in this section. We use the results
in Sections A.1–A.3 in Chapters 2–4, and the results in Section A.3–A.4 in Chapters 5
and 6.

A.1
Probability
Consider performing an experiment in which you can write out all of the outcomes
that could possibly happen, but you do not know exactly which one of those outcomes
will occur. You might flip a coin, or draw a card from a deck, or pick three names out
of a hat containing 20 names. Probabilities are assigned to the different outcomes and
to sets composed of outcomes (called events), in accordance with the likelihood that
the events will occur. Let  be the sample space, the list of all possible outcomes. For
flipping a coin,  = {heads, tails}. Probabilities in finite sample spaces have three

Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
549
550 Appendix A: Probability Concepts Used in Sampling

basic properties:
1 P() = 1.
2 For any event A, 0 ≤ P(A) ≤ 1.
 k 
 
k
3 If the events A1 , . . . , Ak are disjoint, then P Ai = P(Ai ).
i=1 i=1

In sampling, we have a population of N units and use a probability sampling


scheme to select n of those units. We can think of those N units as balls in a box
labelled 1 through N in a box, and we draw n balls from the box. For illustration,
suppose N = 5 and n = 2. Then we draw two labeled balls out of the box:

5
2

1
3

If we take a simple random sample (SRS) of one ball, each ball has an equal probability
1/N of being chosen as the sample.

A.1.1 Simple Random Sampling with Replacement


In a simple random sample with replacement (SRSWR), we put a ball back after it is
chosen, so the same population is used on successive draws from the population. For
the box with N = 5, there are 25 possible samples (a, b) in , where a represents the
Copyright © 2019. CRC Press LLC. All rights reserved.

first ball chosen and b represents the second ball chosen:

(1, 1) (2, 1) (3, 1) (4, 1) (5, 1)


(1, 2) (2, 2) (3, 2) (4, 2) (5, 2)
(1, 3) (2, 3) (3, 3) (4, 3) (5, 3)
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4)
(1, 5) (2, 5) (3, 5) (4, 5) (5, 5)

Since we are taking a random sample, each of the possible samples has the same
probability, 1/25, of being the one chosen. When we take a sample, though, we usually
do not care whether we chose unit 4 first and unit 5 second, or the other way around.
Instead, we are interested in the probability that our sample consists of units 4 and 5
in either order, which we write as S = {4, 5}. By the third property in the definition
of a probability,
2
P({4, 5}) = P[(4, 5) ∪ (5, 4)] = P[(4, 5)] + P[(5, 4)] = .
25
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
A.1 Probability 551

Suppose we want to find P(unit 2 is in the sample). We can either count that nine
of the outcomes above contain 2, so the probability is 9/25, or we can use the addition
formula:
P(A ∪ B) = P(A) + P(B) − P(A ∩ B). (A.1)
Here, let A = {unit 2 is chosen on the first draw} and let B = {unit 2 is chosen on the
second draw}. Then,
P(unit 2 is in the sample) = P(A) + P(B) − P(A ∩ B) = 1/5 + 1/5 − 1/25 = 9/25.
Note that, for this example,
P(A ∩ B) = P(A) × P(B).
That occurs in this situation because events A and B are independent, that is, whatever
happens on the first draw has no effect on the probabilities of what will happen on the
second draw. Independence of the draws occurs in finite population sampling when
we sample with replacement.

A.1.2 Simple Random Sampling without Replacement


Most of the time, we sample without replacement because it is more efficient—if
Heather is already in the sample, why should we use resources by sampling her again?
If we plan to take an SRS (recall that SRS refers to a simple random sample without
replacement) of our population with N balls, the ten possible samples (ignoring the
ordering) are

{1, 2} {1, 3} {1, 4} {1, 5} {2, 3}


{2, 4} {2, 5} {3, 4} {3, 5} {4, 5}

Since there are ten possible samples and we are sampling with equal probabilities,
the probability that a given sample will be chosen is 1/10.
In general, there are
Copyright © 2019. CRC Press LLC. All rights reserved.

 
N N!
= (A.2)
n n!(N − n)!
possible samples of size n that can be drawn without replacement and with equal
probabilities from a population of size N, where
k! = k(k − 1)(k − 2) · · · 1 and 0! = 1.
For our example, there are
 
5 5! 5×4×3×2×1
= = = 10
2 2!(5 − 2)! (2 × 1) (3 × 2 × 1)
possible samples of size 2, as we found when we listed them.
Note that in sampling without replacement, successive draws are not independent.
For this example,
1
P(2 chosen on first draw, 4 chosen on second draw) = .
20
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
552 Appendix A: Probability Concepts Used in Sampling

But P(2 chosen on first draw) = 1/5, and P(4 chosen on second draw) = 1/5, so
P(2 chosen on first draw, 4 chosen on second draw)  = P(2 chosen on first draw) ×
P(4 chosen on second draw).

EXAMPLE A.1 Players of the Arizona State Lottery game “Fantasy 5” choose 5 numbers without
replacement from the numbers 1 through 35. If the 5 numbers you choose match the 5
official winning numbers, you win $50,000. What is the probability you win $50,000?
You could select a total of
 
35 35!
= = 324,632
5 5! 30!
possible sets of 5 numbers. But only
 
5
=1
5
of those sets will match the official winning numbers, so your probability of winning
$50,000 is 1/324,632.
Cash prizes are also given if you match three or four of the numbers. To match
four, you must select four numbers out of the set of five winning numbers, and the
remaining number out of the set of 30 non-winning numbers, so the probability is
  
5 30
4 1 150
P(match exactly 4 balls) =   = .
35 324,632 ■

EXERCISE A.1 What is the probability you match exactly 3 of the numbers? That you match at least
one of the numbers? ■
Copyright © 2019. CRC Press LLC. All rights reserved.

EXERCISE A.2 Calculating the sampling distribution in Example 2.4


A box has eight balls; three of the balls contain the number 7. You select an SRS
(without replacement) of size 4. What is the probability that your sample contains no
7s? Exactly one 7? Exactly two 7s? ■

A.2
Random Variables and Expected Value
A random variable is a function that assigns a number to each outcome in the sample
space. Which number the random variable will actually assume is only determined
after we conduct the experiment and depends on a random process: Before we conduct
the experiment, we only know probabilities with which the different outcomes can
occur. The set of possible values of a random variable, along with the probability
with which each value occurs, is called the probability distribution of the random
variable. Random variables are denoted by capital letters in this book to distinguish
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
A.2 Random Variables and Expected Value 553

them from the fixed values yi . If X is a random variable, then P(X = x) is the
probability that the random variable X takes on the value x. The quantity x is sometimes
called a realization of the random variable X; x is one of the values that could occur
if we performed the experiment.

EXAMPLE A.2 In the game “Fantasy 5,” let X be the amount of money you will win from your
selection of numbers. You win $50,000 if you match all 5 winning numbers, $500
if you match 4, $5 if you match 3, and nothing if you match fewer than 3. Then the
probability distribution of X is given in the following table:

x 0 5 500 50,000

320,131 4350 150 1


P(X = x)
324,632 324,632 324,632 324,632 ■

If you played “Fantasy 5” many, many times, what would you expect your average
winnings per game to be? The answer is the expected value of X, defined by

E(X) = EX = xP(X = x). (A.3)
x

For “Fantasy 5,”


     
320,131 4350 150
E(X) = 0 × + 5× + 500 ×
324,632 324,632 324,632
 
1 176,750
+ 50,000 × = = 0.45.
324,632 324,632
Think of a box containing 324,632 balls, in which 1 ball contains the number 50,000,
150 balls contain the number 500, 4350 balls contain the number 5, and the remaining
320,131 balls contain the number 0. The expected value is simply the average of the
numbers written inside all the balls in the box. One way to think about expected
value is to imagine repeating the experiment over and over again and calculating the
Copyright © 2019. CRC Press LLC. All rights reserved.

long-run average of the results. If you play “Fantasy 5” many, many times, you would
expect to win about 45 cents per game, even though 45 cents is not one of the possible
realizations of X.
Variance, covariance, and the coefficient of variation are defined directly in terms
of the expected value:
V (X) = E[(X − EX)2 ] = Cov (X, X) (A.4)

Cov (X, Y ) = E[(X − EX)(Y − EY )] (A.5)

Cov (X, Y )
Corr (X, Y ) = √ (A.6)
V (X)V (Y )

V (X)
CV (X) = , for E(X)  = 0. (A.7)
E(X)
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
554 Appendix A: Probability Concepts Used in Sampling

Expected value and variance have a number of properties that follow directly from
the definitions above.

Properties of Expected Value 


1 If g is a function, then E[g(X)] = g(x)P(X = x).
x

2 If a and b are constants, then E(aX + b) = aE(X) + b.


3 If X and Y are independent, then E(XY ) = (EX)(EY ).
4 Cov (X, Y ) = E(XY ) − (EX)(EY ).
⎡ ⎤
 n 
m 
n 
m
5 Cov ⎣ (ai Xi + bi ), (cj Yj + dj )⎦ = ai cj Cov (Xi , Yj ).
i=1 j=1 i=1 j=1

6 V (X) = E(X 2 ) − (EX)2 .


7 V (X + Y ) = V (X) + V (Y ) + 2 Cov (X, Y ).
8 −1 ≤ Corr (X, Y ) ≤ 1.

EXERCISE A.3 Prove properties 1 through 8 using the definitions in (A.3) through (A.7). ■

In sampling, we often use estimators that are ratios of two random variables. But
E[Y /X] usually does not equal EY /EX. To illustrate this, consider the following
probability distribution for X and Y :

y
x y P(X = x, Y = y)
x
1
1 2 2
4
Copyright © 2019. CRC Press LLC. All rights reserved.

1
2 8 4
4
1
3 6 2
4
1
4 8 2
4

Then EY/EX = 6/2.5 = 2.4, but E[Y/X] = 2.5. In this example, the values are close
but not equal.
The random variable we use most frequently in this book is
1 if unit i is in the sample
Zi = (A.8)
0 if unit i is not in the sample.
This indicator variable tells us whether the ith unit is in the sample or not. In an SRS,
n of the random variables Z1 , Z2 , . . . , ZN will take on the value 1, and the remaining
N − n will be 0. For Zi to equal 1, one of the units in the sample must be unit i, and
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
A.2 Random Variables and Expected Value 555

the other n − 1 units must come from the remaining N − 1 units in the population, so
P(Zi = 1) = P(ith unit is in the sample)
  
1 N −1
1 n−1
=  
N
n
n
= . (A.9)
N
Thus,
E[Zi ] = 0 × P(Zi = 0) + 1 × P(Zi = 1)
n
= P(Zi = 1) = .
N
Similarly, for i  = j,
P(Zi Zj = 1) = P(Zi = 1 and Zj = 1)
= P(ith unit is in the sample and jth unit is in the sample)
  
2 N −2
2 n−2
=  
N
n
n(n − 1)
= .
N(N − 1)
Thus for i  = j,
E[Zi Zj ] = 0 × P(Zi Zj = 0) + 1 × P(Zi Zj = 1)
n(n − 1)
= P(Zi Zj = 1) = .
N(N − 1)

A.4 Show that


Copyright © 2019. CRC Press LLC. All rights reserved.

EXERCISE
n(N − n)
V (Zi ) = Cov (Zi , Zi ) =
N2
and that, for i  = j,
n(N − n)
Cov (Zi , Zj ) = − .
N 2 (N − 1) ■

The properties of expectation and covariance may be used to prove many results
in finite population sampling. In Chapter 4, we use the covariance of x̄ and ȳ from an
SRS. Let
1  1 
N N
x̄U = xi , ȳU = yj ,
N i=1 N j=1

1 1
N N
x̄ = Zi xi , ȳ = Z j yj ,
n i=1 n j=1
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
556 Appendix A: Probability Concepts Used in Sampling

and

N
(xi − x̄U )(yi − ȳU )
i=1
R= .
(N − 1)Sx Sy
Then,
n  RSx Sy
Cov (x̄, ȳ) = 1 − . (A.10)
N n
We use properties 5 and 6 of expected value, along with the results of
Exercise A.4, to show (A.10):
⎛ ⎞
1 N N
Cov (x̄, ȳ) = 2 Cov ⎝ Zi xi , Zj yj ⎠
n i=1 j=1

1 
N N
= xi yj Cov (Zi , Zj )
n2 i=1 j=1

1  1 
N N N
= x i yi V(Z i ) + xi yj Cov (Zi , Zj )
n2 i=1 n2 i=1 j=i

1N −n  1 N − n 
N N N
= xi yi − x i yj
n N 2
i=1
n N 2 (N − 1) i=1 j=i
 
1 N − n 
N N N
1 N −n N −n
= + x y
i i − xi yj
n N2 N 2 (N − 1) i=1 n N 2 (N − 1) i=1 j=1

1 N −n 
N
1N −n
= x i yi − x̄U ȳU
n N(N − 1) i=1 nN −1
Copyright © 2019. CRC Press LLC. All rights reserved.

1 N −n 
N
= (xi − x̄U )(yi − ȳU )
n N(N − 1) i=1
1 n
= 1− RSx Sy .
n N

EXERCISE A.5 Show that


Corr (x̄, ȳ) = R. ■ (A.11)

A.3
Conditional Probability
In sampling without replacement, successive draws from the population are depen-
dent: The unit we choose on the first draw changes the probabilities of selecting the
other units on subsequent draws. When taking an SRS from our box of five balls in
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
A.3 Conditional Probability 557

Section A.1, each ball has probability 1/5 of being chosen on the first draw. If we
choose ball 2 on the first draw and sample without replacement, then

1
P(select ball 3 on second draw | select ball 2 on first draw) = .
4
(Read as “the conditional probability that ball 3 is selected on the second draw given
that ball 2 is selected on the first draw equals 1/4.”) Conditional probability allows us
to adjust the probability of an event if we know that a related event occurred.
The conditional probability of A given B is defined to be

P(A ∩ B)
P(A | B) = . (A.12)
P(B)
In sampling we usually use this definition the other way around:

P(A ∩ B) = P(A | B)P(B). (A.13)

If events A and B are independent—that is, knowing whether A occurred gives us


absolutely no information about whether B occurred—then P(A | B) = P(A) and
P(B | A) = P(B).
Suppose we have a population with 8 households (HHs) and 15 persons living in
the households, as follows:

Household Persons

1 1, 2, 3
2 4
3 5
4 6, 7
5 8
6 9, 10
7 11, 12, 13, 14
Copyright © 2019. CRC Press LLC. All rights reserved.

8 15

In a one-stage cluster sample, as discussed in Chapter 5, we might take an SRS


of two households, then interview each person in the selected households. Then,

P(select person 10) = P(select HH 6) P(select person 10 | select HH 6)


  
2 2 2
= = .
8 2 8

In fact, for this example the probability that any individual in the population is inter-
viewed is the same value, 2/8, because each household is equally likely to be chosen
and the probability a person is selected is the same as the probability that the household
is selected.
Suppose now that we take a two-stage cluster sample instead of a one-stage cluster
sample, and we interview only one randomly selected person in each selected house-
hold. Then, in this example, we are more likely to interview persons living alone than
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
558 Appendix A: Probability Concepts Used in Sampling

those living with others:


P(select person 4) = P(select HH 2) P(select person 4 | select HH 2)
  
2 1 2
= = ,
8 1 8
but
P(select person 12) = P(select HH 7) P(select person 12 | select HH 7)
  
2 1 2
= = .
8 4 32
These calculations extend to multistage cluster sampling because of the general
result
P(A1 ∩ A2 ∩ · · · ∩ Ak ) = P(A1 | A2 , · · · , Ak )P(A2 | A3 . . . , Ak ) . . . P(Ak ). (A.14)
Suppose we take a three-stage cluster sample of grade school students. First, we take
an SRS of schools, then an SRS of classes within schools, then an SRS of students
within classes. Then the event {Joe is selected in the sample} is the same as {Joe’s
school is selected ∩ Joe’s class is selected ∩ Joe is selected} and we can find Joe’s
probability of inclusion by
P(Joe in sample) = P(Joe’s school is selected)
× P(Joe’s class is selected | Joe’s school is selected)
× P(Joe is selected | Joe’s school and class are selected).
If we sample 10% of the schools, 20% of classes within selected schools, and 50%
of students within selected classes, then
P(Joe in sample) = (0.10)(0.20)(0.50) = 0.01.

A.4
Copyright © 2019. CRC Press LLC. All rights reserved.

Conditional Expectation
Conditional expectation is used extensively in the theory of cluster sampling. Let X
and Y be random variables. Then, using the definition of conditional probability,
P(Y = y ∩ X = x)
P(Y = y | X = x) = . (A.15)
P(X = x)
This gives the conditional distribution of Y given that X = x. The conditional
expectation of Y given that X = x simply follows the definition of expectation using
the conditional distribution:

E(Y | X = x) = yP(Y = y | X = x). (A.16)
y

The conditional variance of Y given that X = x is defined similarly:



V (Y | X = x) = [y − E(Y | X = x)]2 P(Y = y | X = x). (A.17)
y
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
A.4 Conditional Expectation 559

EXAMPLE A.3 Consider a box with two balls, A and B:

B
A

2
1 6

3
4

4
Choose one of the balls at random, then randomly select one of the numbers inside
that ball. Let Y = the number that we choose and let
1 if we choose ball A
Z=
0 if we choose ball B.
Then,
1
P(Y = 1 | Z = 1) = ,
4
1
P(Y = 3 | Z = 1) = ,
4
1
P(Y = 4 | Z = 1) = ,
2
and
     
1 1 1
E(Y | Z = 1) = 1 × + 3× + 4× = 3.
4 4 2
Similarly,
1
P(Y = 2 | Z = 0) =
2
and
Copyright © 2019. CRC Press LLC. All rights reserved.

1
P(Y = 6 | Z = 0) = ,
2
so
   
1 1
E(Y | Z = 0) = 2 × + 6× = 4.
2 2
In short, if we know that ball A is picked, then the conditional expectation of Y is
the average of numbers in ball A since an SRS of size 1 is taken from the ball; the
conditional expectation of Y given that ball B is picked is the average of the numbers
in ball B. ■

Note that E(Y | X = x) is a function of x; call it g(x). Define the conditional


expectation of Y given X, E(Y | X), to be g(X), the same function but of the random
variable instead. E(Y | X) is a random variable and gives us the conditional expected
value of Y for the general random variable X: for each possible value of x, the value
E(Y | X = x) occurs with probability P(X = x).
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
560 Appendix A: Probability Concepts Used in Sampling

EXAMPLE A.4 In Example A.3, we know the probability distribution of Z and can thus use the
conditional expectations calculated to write the probability distribution of E(Y | Z):

z E(Y | Z = z) Probability

1
0 4
2
1
1 3
2 ■

In sampling, we need this general concept of conditional expectation largely so


we can use the following properties of conditional expectation to find expected values
and variances in cluster samples.

Properties of Conditional Expectation


1 E(X | X) = X.
2 E[f (X)Y | X] = f (X)E(Y | X).
3 If X and Y are independent, then E(Y | X) = E(Y ).
4 E(Y ) = E[E(Y | X)].
5 V [Y ] = V [E(Y | X)] + E[V (Y | X)].

Conditional expectation can be confusing, so let’s talk about what these properties
mean. The interested reader should see Ross (2006) or Durrett (1994) for proofs of
these properties.

1 E(X | X) = X. If we know what X is already, then we expect X to be X. The


probability distribution of E(X | X) is the same as the probability distribution of
X.
2 E[f (X)Y | X] = f (X)E(Y | X). If we know what X is, then we know X 2 , or log X,
Copyright © 2019. CRC Press LLC. All rights reserved.

or any function f (X) of X.


3 If X and Y are independent, then E(Y | X) = E(Y ). If X and Y are independent,
then knowing X gives us no information about Y . Thus the expected value of Y ,
the average of all the possible outcomes of Y in the experiment, is the same no
matter what X is.
4 E(Y ) = E[E(Y | X)]. This property, called successive conditioning, and prop-
erty 5 are the ones we use the most in sampling; we use them to find the bias and
variance of estimators in cluster sampling. Successive conditioning simply says
that if we take the weighted average of the conditional expected value of Y given
that X = x, with weights P(X = x), the result is the expected value of Y . You
use successive conditioning every time you take a weighted average of a quantity
over subpopulations: If a population has 60 women and 40 men, and if the average
height of the women is 64 inches and the average height of the men is 69 inches,
then the average height for the class is
(64 × 0.6) + (69 × 0.4) = 66 inches.
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
A.4 Conditional Expectation 561

In this example, 64 is the conditional expected value of height given that the person
is a woman, 69 is the conditional expected value of height given that the person
is a man, and 66 is the expected value of height for all persons in the population.
5 V [Y ] = V [E(Y | X)]+E[V (Y | X)]. This property gives an easy way of calculating
variances in two-stage cluster samples. It says that the total variability has two
parts: (a) the variability that arises because E(Y | X = x) varies with different
values of x, and (b) the variability that arises because there can be different values
of y associated with the same value of x. Note that, using property 6 of Expected
Value in Section A.2,

V (Y | X) = E{[Y − E(Y | X)]2 | X} = E[Y 2 | X] − [E(Y | X)]2 (A.18)

and
 
V [E(Y | X)] = E {E(Y | X) − E[E(Y | X)]}2
 
= E {E(Y | X) − E(Y )}2
= E{[E(Y | X)]2 } − [E(Y )]2 . (A.19)

EXAMPLE A.5 Here’s how conditional expectation properties work in Example A.3. Successive
conditioning implies that

E(Y ) = E(Y | Z = 0)P(Z = 0) + E(Y | Z = 1)P(Z = 1)


   
1 1
= 4× + 3× = 3.5.
2 2

We can find the distribution of V (Y | Z) using (A.18):

V (Y | Z = 0) = E(Y 2 | Z = 0) − [E(Y | Z = 0)]2


Copyright © 2019. CRC Press LLC. All rights reserved.

   
1 1
= 22 × + 62 × − (4)2 = 4,
2 2
V (Y | Z = 1) = E(Y 2 | Z = 1) − [E(Y | Z = 1)]2
     
2 1 2 1 2 1
= 1 × + 3 × + 4 × − (3)2 = 1.5.
4 4 2

These calculations give the following probability distribution for V (Y | Z):

z V (Y | Z = z) Probability

1
0 4
2
1
1 1.5
2
Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.
562 Appendix A: Probability Concepts Used in Sampling

Thus, using (A.19),


 
V [E(Y | Z)] = E [E(Y | Z) − E(Y )]2
= [E(Y | Z = 0) − E(Y )]2 P(Z = 0) + [E(Y | Z = 1) − E(Y )]2 P(Z = 1)
   
2 1 2 1
= (4 − 3.5) × + (3 − 3.5) ×
2 2
= 0.25.
Using the probability distribution of V (Y | Z),
   
1 1
E[V (Y | Z)] = 4 × + 1.5 × = 2.75.
2 2
Consequently,
V (Y ) = V [E(Y | Z)] + E[V (Y | Z)] = 0.25 + 2.75 = 3.00. ■

If we did not have the properties of conditional expectation, we would need to


find the unconditional probability distribution of Y to calculate its expectation and
variance—a relatively easy task for the small number of options in Example A.3 but
cumbersome to do for general multistage cluster sampling.

EXERCISE A.6 Consider the box below, with 3 balls labelled 1, 2, and 3:

8
1 3
4 6
9
5
7 6
Copyright © 2019. CRC Press LLC. All rights reserved.

Suppose we take an SRS of one ball, then subsample an SRS of one number from
the selected ball. Let Z represent the number of the ball chosen, and let Y represent
the number we choose from the ball. Use the properties of conditional expectation to
find E(Y ) and V (Y ). ■

Lohr, Sharon L.. Sampling : Design and Analysis, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/pitt-ebooks/detail.action?docID=5748873.
Created from pitt-ebooks on 2024-05-15 02:45:16.

You might also like