0% found this document useful (0 votes)

187 views

Sample

This document discusses simple random sampling (SRS) from a finite population. It defines SRS as selecting a fixed number (n) of units from a population of size (N) where every possible sample has an equal probability of being selected. The document provides formulas for calculating probabilities, means, variances, and standard errors of SRS. It demonstrates that the sample mean and total are unbiased estimators of the population mean and total. It also shows that the sample variance is a biased estimator of the population variance.

Uploaded by

Shreyas Satardekar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

187 views

Sample

Uploaded by

Shreyas Satardekar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

2.

3 Simple Random Sampling

• Simple random sampling without replacement (srswor) of size n is the probability
sampling design for which a fixed number of n units are selected from a population of N
units without replacement such that every possible sample of n units has equal probability
of being selected. A resulting sample is called a simple random sample or srs.

• Note: I will use SRS to denote a simple random sample and SR as an abbreviation of
‘simple random’.

• Some necessary combinatorial notation:

– (n factorial) n! = n × (n − 1) × (n − 2) × · · · × 2 × 1. This is the number of

unique arrangements or orderings (or permutations) of n distinct items. For example:
6! = 6 × 5 × 4 × 3 × 2 × 1 = 720.

N N (N − 1) · · · (N − n + 1) N!
– (N choose n) = = . This is the
n n! n!(N − n)!
number of combinations of n items selected from N distinct items (and the order of
6 6! (6)(5)(4!) (6)(5)
selection doesn’t matter). For example, = = = = 15.
2 2!4! 2!4! (2)(1)

• There are Nn possible SRSs of size n selected from a population of size N .

• For any SRS of size n from a population of size N , we have P (S) = 1/ Nn .

• Unless otherwise specified, we will assume sampling is without replacement.

2.3.1 Estimation of y U and t

• A natural estimator for the population mean y U is the sample mean y. Because y is an
estimate of an individual unit’s y-value, multiplication by the population size N will give
us an estimate bt of the population total t. That is:
n n
1X NX
U = y =
yc yi t = yi = (10)
b
n i=1 n i=1

U and t are design unbiased. That is, the average values of y and N y taken over all
• yc b
possible SRSs equal y U and t, respectively.

Demonstration of Unbiasedness: Suppose we have a population consisting of five y-values:

Unit i 1 2 3 4 5
yi 0 2 3 4 7

which has the following parameters:

N= t= yU = S2 = S≈
5

Suppose a SRS of size n = 2 is selected. Then P (S) = 1/ 2
= 1/10 for each of the 10 possible
SRSs.

22
All Possible Samples and Statistics from Example Population
t = N y Sb2 = s2
P
Sample Units y-values yi ycU = y
b Sb = s
S1 1,2 0,2 2 1 5 2 1.4142
S2 1,3 0,3 3 1.5 7.5 4.5 2.1213
S3 1,4 0,4 4 2 10 8 2.8284
S4 1,5 0,7 7 3.5 17.5 24.5 4.9497
S5 2,3 2,3 5 2.5 12.5 .5 0.7071
S6 2,4 2,4 6 3 15 2 1.4142
S7 2,5 2,7 9 4.5 22.5 12.5 3.5355
S8 3,4 3,4 7 3.5 17.5 .5 0.7071
S9 3,5 3,7 10 5 25 8 2.8284
S10 4,5 4,7 11 5.5 27.5 4.5 2.1213
Column Sum 32 160 67 22.6274
32 160 67 22.6274
Expected value 10
= 3.2 10 = 16 10 = 6.7 10
= 2.26274
= E(estimator) = yU =t = S2 6= S

The averages for estimators yc b2 2

U = y, t = N y, and S = s equal the parameters that they
b
are estimating. This implies that y, N y, and s2 are unbiased estimators of y U , t, and S 2 .
Notation: E(yc b2
U ) = y U , E(t) = t, E(S ) = S
b 2
or E(y) = y U , E(N y) = t, E(s2 ) = S 2 .

The average for estimator Sb = s does not equal the parameter S. This implies that s is a
biased estimator of S. Notation: E(S) b 6= S or E(s) 6= S.

U = y and t = N y.
• The next problem is to study the variances of yc b

• Warning: In an introductory statistics course, you were told that the

√ variance √
of the sample
mean V (Y ) = S 2 /n (= σ 2 /n) and its standard deviation is S/ n (= σ/ n). This is
appropriate if a sample was to be taken from an infinite or extremely large population.
• However, we are dealing with finite populations that often are not considered extremely
N −n
large. In such cases, we have to adjust our variance formulas by which is known
N
as the finite population correction (f.p.c.).
N −n n
• Texts may rewrite the f.p.c. as either 1 − or 1 − f where f = n/N is the
N N
fraction of the population that was sampled. By definition :
2
S
V (yc
U ) = V (y) = t) = N 2 V (y) = N (N − n)
V (b (11)
n

• Because S 2 is unknown, we use s2 to get unbiased estimators of the variances in (11)::

2
s
Vb (yc
U ) = V (y) =
b t) = N 2 Vb (y) = N (N − n)
Vb (b (12)
n

• Taking a square root of a variance in (11) yields the standard deviation of the estimator.
• Taking a square root of an estimated variance in (12) yields the standard error of the
estimate.

23
S2

N −n 3 6.7
• Thus, V (y) = = = and
N n 5 2
S2 6.7
V (bt) = N 2 V (y) = N (N − n) = (5)(3) = .
n 2
• Like yc U and t, the variances V (y U ) and V (t) are design unbiased. That is the average
b b c b b
of Vb (yc
U ) and V (t) taken over all possible SRSs equal V (y U ) = 2.01 and V (t) = 50.25,
b b c b
respectively.
N − n s2 3 s2

• For the estimated variances we have V (y U ) =
b c = = and
N n 5 2
s2 s2
Vb (b
t) = N (N − n) = (5)(3) = where s2 is a particular sample variance.
n 2
Example: We will use our population from the previous example:
Unit, i 1 2 3 4 5
yi 0 2 3 4 7
which have the following parameters
N =5 t = 16 y U = 3.2 S 2 = 6.7 S ≈ 2.588
Estimated Variances of yc
U and t for All Samples
b

Sample Units y-values s2 Vb (yc

U ) = .3s
2
t) = 7.5s2
Vb (b
S1 1,2 0,2 2 0.6 15
S2 1,3 0,3 4.5 1.35 33.75
S3 1,4 0,4 8 2.4 60
S4 1,5 0,7 24.5 7.35 183.75
S5 2,3 2,3 .5 0.15 3.75
S6 2,4 2,4 2 0.6 15
S7 2,5 2,7 12.5 3.75 93.75
S8 3,4 3,4 .5 0.15 3.75
S9 3,5 3,7 8 2.4 60
S10 4,5 4,7 4.5 1.35 33.75
Column Sum
• From the table we have E(Vb (yc
U )) = 20.1/10 = 2.01 = V (y U ) and E(V (t)) = 502.5/10 =
c b b
50.25 = V (b
t). Thus, we see that both variance estimators are unbiased.
• If N is large relative to n, then the finite population correction (f.p.c.) will be close to
(but less than) 1. Omitting the finite population correction from the variance formulas
(i.e., replacing (N − n)/N with 1) will slightly overestimate the true variance. That is,
there is a small positive bias. I personally would not recommend omitting the f.p.c..
• If N is not large relative to n, then omitting the f.p.c. from the variance formulas can
seriously overestimate the true variance. That is, there can be a large positive bias.
N −n
• As n → N , → 0. That is, as the sample size approaches the population size, the
N
f.p.c. approaches 0. Thus, in (11) and (12) the variances → 0 as n → N .

24
2.3.2 SRS With Replacement
• Consider a sampling procedure in which a sampling unit is randomly selected from the
population, its y-value recorded, and is then returned to the population. This process of
randomly selecting units with replacement after each stage is repeated n times. Thus, a
sampling unit may be sampled multiple times. A sample of n units selected by such a
procedure is called a simple random sample with replacement.
s2
• The estimators for SRS with replacement are: U = y
yc Vb (yc
U ) = V
b (by ) =
n
• Suppose we have two estimators θb1 and θb2 of some parameter θ.
θb1 is less efficient than θb2 for estimating θ if V (θb1 ) > V (θb2 ).
θb1 is more efficient than θb2 for estimating θ if V (θb1 ) < V (θb2 ).
• For most situations, the estimator for a SRS with replacement is less efficient than the
estimator for a SRS without replacement.
• There will be circumstances (such as sampling proportional to size) where we will consider
sampling with replacement. Unless otherwise stated, we assume that sampling is done
without replacement.

2.4 Two-Sided Confidence Intervals for y U and t

• In an introductory statistics course, you were given confidence interval formulas
s s
y ± z∗ √ and y ± t∗ √ (13)
n n
These formulas are applicable if a sample was to be taken from an infinitely or extremely
large population. But when we are dealing with finite populations, we adjust our variance
formulas by the finite population correction .
• In the finite population version of the Central Limit Theorem, we assume the estimators
U = y and t = N y have sampling distributions that are approximately normal. That is,
yc b
2 2

N − n S S
ycU ∼˙ N yU , and t∼˙ N t , N (N − n)
b
N n n
• For large samples, approximate 100(1 − α)% confidence intervals for y U (µ) and t (τ ) are
For y U : For t : (14)
s r
N −n s2 s2
y ± z∗ N y ± z∗ N (N − n)
N n n
s
∗ N −n p
y ± z s /n N y ± z∗s N (N − n)/n (15)
N
where z ∗ is the upper α/2 critical value from the standard normal distribution. Or, in
standard error (s.e.) notation,
U ±
yc t ±
b
For 90%, 95%, and 99%, z ∗ = 1.645, 1.96, and 2.576, respectively.

25
• For smaller samples, approximate 100(1 − α)% confidence intervals for y U and t are

For y U : For t : (16)

s r
N −n s2 s2
y ± t∗ N y ± t∗ N (N − n)
N n n
s
∗ N −n p
y ± ts /n N y ± t∗ s N (N − n)/n (17)
N

where t∗ is the upper α/2 critical value from the t(n − 1) distribution.

U = y or t = N y in the confidence
• The quantity being added and subtracted from yc b
interval is known as the margin of error.

Example: Use the small population data again. For n = 2, t∗ ≈ 6.314 for a nominal 90%
confidence level.
All Possible Samples and Confidence Intervals from Example Population
t = N y Sb2 = s2 Sb = s Vb (yc
P
Sample y-values yi yc
U =y
b U) Vb (b
t) 90% ci for t
1 0,2 2 1 5 2 1.4142 0.6 15 (-19.45, 29.45)
2 0,3 3 1.5 7.5 4.5 2.1213 1.35 33.75 (-29.18, 44.18)
3 0,4 4 2 10 8 2.8284 2.4 60 (-38.91, 58.91)
4 0,7 7 3.5 17.5 24.5 4.9497 7.35 183.75 (-68.09, 103.09)
5 2,3 5 2.5 12.5 .5 0.7071 0.15 3.75 (0.27, 24.73)
6 2,4 6 3 15 2 1.4142 0.6 15 (-9.45, 39.45)
7 2,7 9 4.5 22.5 12.5 3.5355 3.75 93.75 (-38.63, 83.63)
8 3,4 7 3.5 17.5 .5 0.7071 0.15 3.75 (5.27, 29.73)
9 3,7 10 5 25 8 2.8284 2.4 60 (-23.91, 73.91)
10 4,7 11 5.5 27.5 4.5 2.1213 1.35 33.75 (-9.18, 64.18)

2.4.1 One-Sided Confidence Intervals for y U and t

• Occasionally, a researcher may want a one-sided confidence interval. There are two types
of one-sided confidence intervals: upper and lower.

• Approximate upper and lower 100(1 − α)% confidence intervals for y U and t are:

For y U : For t :
s !
N − n p
y − t∗ s /n , ∞ N y − t∗ s N (N − n)/n , ∞ upper
N
s !
N − n p
−∞ , y + t∗ s /n −∞ , N y + t∗ s N (N − n)/n lower
N

where t∗ is the upper α critical value from the t(n − 1) distribution.

• If the y-values cannot be negative, replace −∞ with 0 in the lower confidence interval
formulas. If the y-values cannot be positive, replace ∞ with 0 in the upper confidence
interval formulas.

26
• Later, we will discuss another method of generating a confidence interval called boot-
strapping. This will be useful when the sample size may be small and the central limit
theorem cannot be applied.

SRS Example with Strong Spatial Correlation

• To illustrate the application of simple random sampling to population mean per unit µ
estimation, consider the abundance data in Figure 1. The abundance counts are artificial
but show a strong diagonal spatial correlation.

• The region has been gridded into a 20×20 grid of 10×10 m quadrats. The total abundance
t = 13354 and the mean per unit is y U = 33.385. The population variance S 2 = 75.601.

• This data will be used to compare estimation properties of various sampling designs when
data are spatially correlated.

Figure 1

Data Exhibiting Strong Spatial Correlation

18 20 15 20 20 15 19 18 24 23 20 26 29 28 28 31 31 34 28 32
13 20 16 20 15 23 19 26 21 21 24 30 23 26 25 33 31 28 32 38
16 18 20 24 25 26 22 23 26 26 22 27 25 25 34 28 37 36 38 31
17 17 16 22 21 23 22 27 27 24 28 32 29 33 27 37 37 38 35 33
15 19 23 17 21 23 21 23 24 25 31 26 32 34 32 33 31 31 36 37
21 24 20 21 28 26 30 22 31 25 29 29 27 30 29 37 35 32 38 43
23 17 24 25 24 27 31 29 31 34 27 36 29 29 34 39 37 37 40 36
18 24 21 25 27 22 32 32 31 26 28 34 34 37 35 34 38 38 37 40
22 26 28 26 24 29 33 26 27 27 34 31 39 32 36 38 37 40 44 43
23 27 28 29 26 32 25 31 35 34 32 33 37 32 42 40 40 37 42 44
23 21 31 23 30 27 31 30 32 35 30 40 32 37 37 36 40 44 44 40
26 29 31 26 30 31 34 36 30 38 36 32 38 38 37 42 42 41 40 49
28 24 28 27 26 31 32 29 32 33 38 34 39 38 40 37 41 43 42 43
32 25 31 32 29 29 35 38 38 32 36 35 39 42 39 40 44 42 41 45
27 29 35 28 35 35 31 40 35 37 38 44 40 40 47 39 49 48 51 49
30 29 32 32 33 30 36 38 42 36 35 38 44 47 45 49 41 43 44 51
28 35 35 34 34 33 41 33 34 35 39 44 44 48 44 50 49 48 53 54
29 33 32 36 39 33 33 34 35 42 46 47 48 47 46 45 44 52 54 55
28 37 38 37 33 33 34 37 45 40 39 42 42 46 47 48 52 47 46 53
38 39 39 37 34 38 39 45 39 42 45 41 44 51 46 50 52 51 51 53

27
SRS taken from Figure 1 (n = 10, t = 13354, y U = 33.385, y = 34.1, s2 = 18.32)

18 20 15 20 20 15 19 18 24 23 20 26 29 28 28 31 31 34 28 32
13 20 16 20 15 23 19 26 21 21 24 30 23 26 25 (33) 31 28 32 38
16 18 20 24 25 26 22 23 26 26 22 27 25 25 34 28 37 36 38 31
17 17 16 22 21 23 22 27 27 24 28 32 29 (33) 27 37 37 38 35 33
15 19 23 17 21 23 21 23 24 25 31 26 32 34 32 33 31 31 36 37
21 24 20 21 28 26 (30) 22 31 25 29 29 27 30 29 37 35 32 38 43
23 17 24 25 24 27 31 29 31 34 27 36 29 29 34 39 37 37 40 36
18 24 21 25 27 22 32 32 31 26 28 34 34 37 35 (34) 38 38 37 40
22 26 28 26 24 29 33 26 27 27 34 31 (39) 32 36 38 37 40 44 43
23 27 28 29 26 32 25 31 35 34 32 33 37 32 42 40 40 37 42 44
23 21 31 23 30 27 31 30 32 35 30 40 32 37 37 36 40 44 44 40
26 29 31 26 30 31 34 36 30 38 36 32 38 38 37 42 42 41 40 49
28 24 28 (27) 26 31 32 29 32 33 38 34 39 38 40 37 41 43 42 43
32 25 31 (32) 29 29 35 38 38 32 (36) 35 39 42 39 40 44 42 41 45
27 29 35 28 35 35 31 40 35 37 38 44 40 40 47 39 49 48 51 49
30 29 32 32 33 30 36 38 42 36 35 38 44 47 45 49 41 43 44 51
28 (35) 35 34 34 33 41 33 34 35 39 44 44 48 44 50 49 48 53 54
29 33 32 36 39 33 33 34 35 42 46 47 48 47 46 45 44 52 54 55
28 37 38 37 33 33 34 37 45 40 39 42 (42) 46 47 48 52 47 46 53
38 39 39 37 34 38 39 45 39 42 45 41 44 51 46 50 52 51 51 53

28
29
SRS Example using Rathbun and Cressie (1994) Data
• To illustrate the application of simple random sampling to population total t estimation, consider
the abundance data in Figure 2. The abundance counts correspond to the census data studied
by Rathbun and Cressie (1994).
• This 200 × 200 m study region is located in an old-growth forest in Thomas County, Georgia.
This data represents the number of longleaf pine trees located in each quadrat. The coordinates
of the 584 tree locations are given in Cressie (1991).
• I have gridded the region into a 20 × 20 grid of 10 × 10 m quadrats. The total abundance
t = 584 and the mean abundance per quadrat y U = 584/400 = 1.435. The population variance
S 2 = 3.853.
• There is only a weak spatial correlation of tree counts within the study region.
• The pineleaf census data will be used to compare estimation properties of various sampling
designs.
• Note the two relatively large boldfaced values (14 and 16).

Figure 2

Longleaf Pine Data (Rathbun and Cressie 1994)

1 1 1 1 1 2 1 0 0 0 4 5 0 1 0 1 2 1 0 1
3 2 1 0 1 0 0 0 1 2 2 2 0 2 2 2 0 2 0 1
7 4 1 1 1 1 0 0 0 2 2 0 4 3 2 4 2 1 2 2
0 1 2 0 0 0 0 0 4 6 5 1 5 0 0 0 2 1 2 0
1 1 0 2 3 2 0 0 2 1 3 1 4 1 1 1 2 2 1 1
2 0 0 0 4 3 3 0 1 16 5 0 1 3 8 0 0 1 3 3
0 0 1 14 3 3 1 2 0 8 0 2 0 3 9 0 4 2 1 0
0 0 5 1 8 7 6 6 6 1 0 4 0 0 1 2 2 0 1 2
0 0 2 2 3 2 2 3 1 1 1 3 0 0 2 2 0 3 4 0
0 0 0 0 1 0 3 1 1 1 2 0 2 0 2 0 2 1 1 0
1 8 7 7 8 0 5 0 1 0 1 2 0 0 2 4 2 2 2 4
0 9 1 0 0 1 1 1 0 0 0 1 2 4 0 2 1 3 3 1
0 0 0 1 0 2 4 3 1 2 2 0 0 1 1 2 2 0 2 4
0 1 0 0 1 2 0 2 3 5 2 0 0 2 1 1 2 0 1 3
1 0 0 1 1 0 0 0 2 2 2 1 1 1 0 0 2 0 0 0
0 2 0 2 2 0 1 1 0 2 0 0 1 0 0 1 1 1 5 3
0 0 0 3 2 1 0 0 0 0 0 2 1 0 1 1 1 3 1 2
1 0 0 1 0 3 0 1 0 0 2 1 2 0 0 0 1 1 1 0
0 0 0 0 0 0 0 1 1 1 0 1 0 3 0 2 0 1 1 0
2 0 0 0 0 0 0 0 1 2 0 1 3 0 0 1 0 1 2 4

REFERENCES (for Figure 2 data)

Cressie, Noel (1991) Statistics for Spatial Data. Wiley, New York.
Rathbun, S.L. and Cressie, N. (1994) A space-time survival point process for a longleaf pine forest in
southern Georgia. Journal of the American Statistical Association, 89, 1164-1174.

29
29
SRS taken from Figure 2 (n = 20, t = 584, y U = 1.435, y = 1.55, s2 = 10.9974)

1 1 1 1 1 2 1 0 0 0 4 5 0 1 0 1 2 1 0 1
3 2 1 0 1 0 0 0 1 2 2 2 0 2 2 2 0 2 0 1
7 4 1 1 1 1 0 0 0 2 2 0 4 3 2 4 2 1 2 2
0 1 2 0 0 (0) 0 0 4 6 5 1 5 0 0 0 2 1 2 0
1 (1) 0 2 3 2 (0) 0 2 1 3 1 4 1 1 1 2 2 1 1
2 0 0 0 4 3 3 0 1 16 5 0 1 (3) 8 0 0 1 3 3
0 (0) 1 (14) 3 (3) 1 2 0 8 (0) 2 0 3 9 0 4 2 1 0
0 0 5 (1) 8 7 (6) 6 6 1 0 4 0 0 1 2 2 0 1 2
0 0 2 2 3 2 2 3 1 1 1 3 0 0 2 2 0 3 4 (0)
0 0 0 0 1 0 3 1 1 1 2 0 2 0 2 (0) 2 1 1 0
1 8 7 7 8 0 5 0 1 (0) 1 2 0 (0) 2 4 2 2 2 4
0 9 1 0 (0) 1 1 1 0 0 0 1 2 4 0 2 1 3 3 1
0 0 0 1 0 2 4 3 1 2 2 0 0 1 1 2 2 0 2 4
0 1 0 0 1 2 0 2 3 5 2 0 0 2 1 1 2 0 1 3
1 0 0 1 1 0 0 0 2 2 2 (1) 1 1 0 0 (2) 0 0 0
0 2 0 2 2 0 1 1 0 2 0 0 1 0 0 1 1 1 5 3
0 0 0 3 2 1 0 0 0 0 0 2 1 0 1 1 1 3 1 2
1 (0) 0 1 0 3 (0) 1 0 0 2 1 2 0 0 0 1 1 1 0
0 0 0 0 0 0 0 1 1 1 0 1 0 3 0 2 0 1 1 0
2 0 0 0 0 0 0 0 1 2 0 1 3 (0) 0 1 0 1 2 4

30
2.4.2 Using the R Survey Package for a SRS
R Code and Output for Figure 1 SRS Analysis

"count" "fpc" <- This is the contents of the data file fig1.txt
33 400 <- The first column are the recorded responses
33 400 <- The second column is the population size N
30 400
34 400
39 400
27 400
32 400
36 400
35 400
42 400

R Code
source("c:/courses/st446/rcode/confintt.r")

# t-based confidence intervals for SRS in Figure 1

library(survey)
srsdat <- read.table("c:/courses/st446/rcode/fig1.txt", header=T)
srsdat

srs_design <- svydesign(id=~1, fpc=~fpc, data=srsdat)

srs_design

esttotal <- svytotal(~count,srs_design)

print(esttotal,digits=15)
confint.t(esttotal,degf(srs_design),level=.95)
confint.t(esttotal,degf(srs_design),level=.95,tails=’lower’)
confint.t(esttotal,degf(srs_design),level=.95,tails=’upper’)

estmean <- svymean(~count,srs_design)

print(estmean,digits=15)
confint.t(estmean,degf(srs_design),level=.95)
confint.t(estmean,degf(srs_design),level=.95,tails=’lower’)
confint.t(estmean,degf(srs_design),level=.95,tails=’upper’)
R output for t-based confidence interval for SRS

> srsdat
count fpc
1 33 400
2 33 400
3 30 400
4 34 400
5 39 400
6 27 400
7 32 400
8 36 400
9 35 400
10 42 400

31
Independent Sampling design

total SE
count 13640 534.63

-------------------------------------------------------------------
mean( count ) = 13640.00000
SE( count ) = 534.62760
Two-Tailed CI for count where alpha = 0.05 with 9 df
2.5 % 97.5 %
12430.58835 14849.41165
-------------------------------------------------------------------

-------------------------------------------------------------------
mean( count ) = 13640.00000
SE( count ) = 534.62760
One-Tailed (Lower) CI for count where alpha = 0.05 with 9 df
5 % upper
12659.96724 infinity
-------------------------------------------------------------------

-------------------------------------------------------------------
mean( count ) = 13640.00000
SE( count ) = 534.62760
One-Tailed (upper) CI for count where alpha = 0.05 with 9 df
lower 95 %
-infinity 14620.03276
-------------------------------------------------------------------

mean SE
count 34.1 1.3366

-------------------------------------------------------------------
mean( count ) = 34.10000
SE( count ) = 1.33657
Two-Tailed CI for count where alpha = 0.05 with 9 df
2.5 % 97.5 %
31.07647 37.12353
-------------------------------------------------------------------

-------------------------------------------------------------------
mean( count ) = 34.10000
SE( count ) = 1.33657
One-Tailed (Lower) CI for count where alpha = 0.05 with 9 df
5 % upper
31.64992 infinity
-------------------------------------------------------------------

-------------------------------------------------------------------
mean( count ) = 34.10000
SE( count ) = 1.33657
One-Tailed (upper) CI for count where alpha = 0.05 with 9 df
lower 95 %
-infinity 36.55008
-------------------------------------------------------------------

32
R Code and Output for Figure 2 SRS Analysis

source("c:/courses/st446/rcode/confintt.r")

# t-based confidence intervals for SRS in Figure 2

library(survey)
srsdat <- read.table("c:/courses/st446/rcode/fig2.txt", header=T)
srsdat

srs_design <- svydesign(id=~1, fpc=~fpc, data=srsdat)

srs_design

esttotal <- svytotal(~count,srs_design)

estmean <- svymean(~count,srs_design)

print(estmean,digits=15)
confint.t(estmean,degf(srs_design),level=.95)
confint.t(estmean,degf(srs_design),level=.95,tails=’lower’)
confint.t(estmean,degf(srs_design),level=.95,tails=’upper’)

R output for t-based confidence interval for SRS

The data file:

count fpc "count" "fpc"
1 1 400 1 400
2 0 400 0 400
3 0 400 0 400
4 14 400 14 400
5 1 400 1 400
6 0 400 0 400
7 0 400 0 400
8 3 400 3 400
9 0 400 0 400
10 6 400 6 400
11 0 400 0 400
12 0 400 0 400
13 0 400 0 400
14 1 400 1 400
15 3 400 3 400
16 0 400 0 400
17 0 400 0 400
18 0 400 0 400
19 2 400 2 400
20 0 400 0 400

total SE
count 620 289.1

33
-------------------------------------------------------------------
mean( count ) = 620.00000
SE( count ) = 289.10206
Two-Tailed CI for count where alpha = 0.05 with 19 df
2.5 % 97.5 %
14.90244 1225.09756
-------------------------------------------------------------------

-------------------------------------------------------------------
mean( count ) = 620.00000
SE( count ) = 289.10206
One-Tailed (Lower) CI for count where alpha = 0.05 with 19 df
5 % upper
120.10415 infinity
-------------------------------------------------------------------

-------------------------------------------------------------------
mean( count ) = 620.00000
SE( count ) = 289.10206
One-Tailed (upper) CI for count where alpha = 0.05 with 19 df
lower 95 %
-infinity 1119.89585
-------------------------------------------------------------------

mean SE
count 1.55 0.7228

-------------------------------------------------------------------
mean( count ) = 1.55000
SE( count ) = 0.72276

Two-Tailed CI for count where alpha = 0.05 with 19 df

2.5 % 97.5 %
0.03726 3.06274
-------------------------------------------------------------------

-------------------------------------------------------------------
mean( count ) = 1.55000
SE( count ) = 0.72276
One-Tailed (Lower) CI for count where alpha = 0.05 with 19 df
5 % upper
0.30026 infinity
-------------------------------------------------------------------

-------------------------------------------------------------------
mean( count ) = 1.55000
SE( count ) = 0.72276
One-Tailed (upper) CI for count where alpha = 0.05 with 19 df
lower 95 %
-infinity 2.79974
-------------------------------------------------------------------

34
2.4.3 Using SAS PROC Surveymeans for a SRS
DM ’LOG;CLEAR;OUT;CLEAR’; *** I recommend putting these two lines of code;
OPTIONS NODATE NONUMBER; *** at the beginning of every SAS program ;

data SRS_Fig1;
wgt= 400/10; * wgt = N/n ;
input count @@;
datalines;
33 33 30 34 39 27 32 36 35 42
;
proc surveymeans data=SRS_Fig1 total=400 mean clm sum clsum;
var count;
weight wgt;
title1 ’Simple Random Sample -- Example 1’;
title2 ’Estimating the population mean and total from the data in Figure 1’;
run;
===========================================================================

Simple Random Sample -- Example 1

Estimating the population mean and total from the data in Figure 1

The SURVEYMEANS Procedure

Data Summary

Number of Observations 10
Sum of Weights 400

Statistics

Std Error
Variable Mean of Mean 95% CL for Mean
--------------------------------------------------------------
count 34.100000 1.336569 31.0764709 37.1235291
--------------------------------------------------------------

Variable Sum Std Dev 95% CL for Sum

---------------------------------------------------------------
count 13640 534.627596 12430.5884 14849.4116
---------------------------------------------------------------

35
DM ’LOG;CLEAR;OUT;CLEAR’;
OPTIONS NODATE NONUMBER LS=80 PS=400;

data SRS_Fig2;
wgt= 400/20; * wgt = N/n ;
input trees @@;
datalines;
1 0 0 14 1 0 0 3 0 6 0 0 0 1 3 0 0 0 2 0
;
proc surveymeans data=SRS_Fig2 total=400 mean clm sum clsum;
var trees;
weight wgt;
title1 ’Simple Random Sample -- Example 2’;
title2 ’Estimating the population mean and total from the data in Figure 2’;
run;
============================================================================

Simple Random Sample -- Example 2

Estimating the population mean and total from the data in Figure 2

The SURVEYMEANS Procedure

Data Summary

Number of Observations 20
Sum of Weights 400

Statistics

Std Error
Variable Mean of Mean 95% CL for Mean
--------------------------------------------------------------
trees 1.550000 0.722755 0.03725610 3.06274390
--------------------------------------------------------------

Variable Sum Std Dev 95% CL for Sum

---------------------------------------------------------------
trees 620.000000 289.102058 14.9024382 1225.09756
---------------------------------------------------------------

36
2.5 Attribute Proportion Estimation
• Suppose we are interested in an attribute (characteristic) associated with the sampling
units. The population proportion p is the proportion of population units having that
attribute.
• Statistically, the goal is to estimate proportion p.
• Examples: the proportion of females (or males) in an animal population, the proportion of
consumers who own motorcycles, the proportion of married couples with at least 1 child. . .
• Statistically, we use an indicator function that assigns a yi value to unit i as follows:

1 if unit i possesses the attribute
yi =
0 otherwise
N N
X 1 X
Then t = yi and yU = yi = p. The population proportion p can be
i=1
N i=1
expressed as a population mean y U . Therefore, we will, under certain conditions, be able
to apply the SRS methods for estimating y U .
• By taking a SRS of size n, we can
Pnestimate p with the sample proportion pb of units that
i=1 yi
possess that attribute: pb = = y. The sample proportion pb is unbiased for p.
n
• For a finite population of 0 and 1 values, the population variance
N
1 X
S2 = (yi − p)2 =
N − 1 i=1

• Therefore, the variance of pb is

N − n S2

N −n N p(1 − p)
V (b
p) = = =
N n N N −1 n
(18)
n
• Because S 2 is unknown, we estimate it with s2 = pb(1 − pb). Substitution provides
n−1
the unbiased estimator of V (b
p):
N − n s2

V (b
b p) = = (19)
N n

• The square root of V (b

p) in (18) is the standard deviation of the estimator pb.

• The square root of Vb (b

p) in (19) is the standard error of pb.

• The effects of omitting the finite population correction (f.p.c.) from the formulas for large
and small samples apply here as they did earlier.

37
Figure 3: The Presence/Absence of Longleaf Pine
Rathbun/Cressie data (t = 249 N = 400 p = .6225)
1 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 1
1 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1 0 1 0 1
1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1
0 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 0
1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1
0 0 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0
0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 0 1 1
0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0
0 0 0 0 1 0 1 1 1 1 1 0 1 0 1 0 1 1 1 0
1 1 1 1 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 1
0 1 1 0 0 1 1 1 0 0 0 1 1 1 0 1 1 1 1 1
0 0 0 1 0 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1
0 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1
1 0 0 1 1 0 0 0 1 1 1 1 1 1 0 0 1 0 0 0
0 1 0 1 1 0 1 1 0 1 0 0 1 0 0 1 1 1 1 1
0 0 0 1 1 1 0 0 0 0 0 1 1 0 1 1 1 1 1 1
1 0 0 1 0 1 0 1 0 0 1 1 1 0 0 0 1 1 1 0
0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 1 0 1 1 0
1 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 1

A simple random sample of size n = 25

1 1 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 (1) 0 1
(1) 1 1 0 1 0 0 0 1 (1) 1 (1) 0 1 1 1 0 1 0 1
1 1 1 1 1 1 0 0 0 1 1 0 1 (1) 1 1 1 (1) 1 1
0 (1) 1 0 0 0 0 (0) 1 1 1 1 1 0 0 (0) 1 1 1 0
1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 (1)
1 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 1
0 0 1 1 (1) 1 1 1 0 1 0 1 0 1 1 (0) 1 1 1 0
0 0 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 0 1 1
0 0 1 1 1 1 1 1 1 1 1 1 0 0 (1) 1 0 1 1 0
0 0 0 0 1 0 1 1 (1) 1 1 0 1 0 1 0 1 1 1 0
1 1 1 1 1 0 1 0 1 0 1 1 0 0 (1) 1 1 1 1 1
0 1 1 0 0 1 1 1 0 0 0 (1) 1 1 0 1 1 1 1 1
0 0 0 1 0 1 1 1 1 1 1 0 (0) 1 1 (1) (1) 0 1 1
0 1 0 0 1 1 0 (1) 1 1 1 0 0 1 1 1 1 0 1 1
1 0 0 1 1 0 0 0 1 1 1 1 1 1 0 0 1 0 0 0
(0) 1 0 (1) 1 0 1 1 0 1 0 0 1 0 0 1 1 1 1 1
0 0 0 1 1 1 0 0 0 0 0 1 1 0 (1) 1 1 1 1 1
1 0 0 1 0 1 0 1 0 0 1 1 1 0 0 0 1 1 1 0
0 0 (0) 0 0 0 (0) 1 1 1 0 1 0 1 0 1 0 1 1 0
1 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 0 1 1 1

38
2.5.1 Confidence Intervals for p
• Let the random variable Y = the number of units in a SRS of size n that possess the
attribute of interest. We know (in theory) that the sampling distribution of Y follows a
hypergeometric distribution.

• Hypergeometric distribution for a SRS: The probability that a SRS of size n will have
exactly j sampling units possessing the attribute is
t N −t

j n−j
Pr(Y = j) = N

n

= the probability that a SRS will consist of j ones and n − j zeroes selected from the
population containing t ones (1’s) and N − t zeroes (0’s).

• Although confidence interval calculations can be based on probability tables of hyper-

geometric distributions, we will use a more common approach that will apply to many
sampling situations.

• Remember there are t ones and N − t zeros in the population. However, t is unknown.
If we can assume that n is small relative to both t and N − t, we can use the binomial
approximation to the hypergeometric distribution. That is, Y ∼ ˙ BIN(n, p).

• Although the problem no longer depends on t, it still depends on the unknown proportion
parameter p.

• What is commonly done is to apply the normal approximation to the binomial distribution:

pb ∼
˙ N (p, V (b
p)) .

• Thus, if the sample size n is large enough, we use Vb (b

p) to estimate V (b p). An approximate
100(1 − α)% confidence interval for p is:
s
N − n pb(1 − pb)
q
∗ ∗
pb ± z Vb (b p) OR pb ± z (20)
N n−1

where z ∗ is the upper α/2 critical value from the standard normal distribution. Sample
sizes are typically large enough to use z ∗ instead of t∗ .

• The normal approximation will be reasonable given

1. n is not too large relative to t or N − t. This will be a problem if p is close to 0 or

1.
2. The smaller of nb
p and n(1 − pb) is not too small. In most texts, it is suggested that
both nb
p and n(1 − pb) should be ≥ 5, while some texts use ≥ 10.

39
R Code and Output for Figure 3 Example

source("c:/courses/st446/rcode/confintt.r")

# t-based confidence intervals for SRS in Figure 3

library(survey)
srsdat <- read.table("c:/courses/st446/rcode/fig3.txt", header=T)
srsdat

srs_design <- svydesign(id=~1, fpc=~fpc, data=srsdat)

estmean <- svymean(~presence,srs_design)

print(estmean,digits=15)
confint.t(estmean,degf(srs_design),level=.90)
confint.t(estmean,degf(srs_design),level=.90,tails=’lower’)
confint.t(estmean,degf(srs_design),level=.90,tails=’upper’)
R output for t-based confidence interval for SRS
> srsdat
presence fpc
1 1 400
2 1 400
3 1 400
: : :
23 0 400
24 0 400
25 0 400

mean SE
presence 0.72 0.0887

-------------------------------------------------------------------
mean( presence ) = 0.72000
SE( presence ) = 0.08874
Two-Tailed CI for presence where alpha = 0.1 with 24 df
5 % 95 %
0.56817 0.87183
-------------------------------------------------------------------

-------------------------------------------------------------------
mean( presence ) = 0.72000
SE( presence ) = 0.08874
One-Tailed (Lower) CI for presence where alpha = 0.1 with 24 df
10 % upper
0.60305 infinity
-------------------------------------------------------------------

-------------------------------------------------------------------
mean( presence ) = 0.72000
SE( presence ) = 0.08874
One-Tailed (upper) CI for presence where alpha = 0.1 with 24 df
lower 90 %
-infinity 0.83695
-------------------------------------------------------------------

40
SAS Code and Output for Figure 3 Example

DM ’LOG;CLEAR;OUT;CLEAR’;
OPTIONS NODATE NONUMBER LS=72 PS=54;

DATA SRS_Fig3;
INPUT ind @@;
DATALINES;
1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 0 1 1 0 0
;
DATA SRS_Fig3; set SRS_Fig3;
IF ind = 0 then pa = ’absent ’;
IF ind = 1 then pa = ’present’;

PROC SURVEYMEANS DATA=SRS_Fig3 TOTAL = 400 ALPHA = .10;

VAR pa;
TITLE ’Simple Random Sample -- Figure 3’;
TITLE2 ’Estimating population proportion p’;
RUN;
==================================================================

Simple Random Sample -- Figure 3

Estimating population proportion p

The SURVEYMEANS Procedure

Data Summary

Number of Observations 25

Class Level Information

Class
Variable Levels Values

pa 2 absent present

Statistics

Std Error
Variable Level N Mean of Mean 90% CL for Mean
-------------------------------------------------------------------------
pa absent 7 0.280000 0.088741 0.12817428 0.43182572
present 18 0.720000 0.088741 0.56817428 0.87182572
-------------------------------------------------------------------------

41
2.6 Sample Size Determination with Simple Random Sampling
• It is well known that an increase in sample size n will lead to a more precise estimator
of y U or t. It is also obvious that an increase in the sample size n will make the sample
more expensive to collect. There will, however, be a limited amount of resources available
(allocated, budgeted) for data collection.

• When designing a sampling plan, the researcher wants to achieve a desired degree of
reliability at the lowest possible cost while satisfying the resource limitations for data
collection. That is, the goal is to get the most information given resources and constraints.

• To do this, the researcher tries to achieve a balance to avoid the following mistakes:

– Oversampling: The sampling plan may provide more precision than is needed. Over-
sampling will lead to increased sampling effort, time, and cost.
– Undersampling: The sampling plan may yield insufficient precision resulting in pro-
ducing overly-wide confidence intervals. Undersampling will lead to wasted time and
money.

• To determine a sample size n when estimating a parameter θ, we do the following:

– Estimate the sample size n required so that the probability of the difference be-
tween the estimator θb and the parameter being estimated θ exceeds some maximum
allowable difference d = |θb − θ| is at most α. Or, equivalently, find n such that
Pr(|θb − θ| > d) < α.

• This is equivalent to finding n large enough so that the margin of error

2.6.1 When Estimating y U

• Situation: Estimate the SRS size required so the probability that the difference between
the estimator yc U = y and the population mean y U does not exceed a maximum allowable
difference d is at most α.

• Mathematically, find n such that Pr(|yc

U − y U | > d) < α for a specified maximum
allowable difference d.

• Assuming y is approximately
s normally distributed, this is equivalent to finding n so that
2
N −n S
the margin of error zα/2 ≤ d. Solving this inequality for n yields
N n

1
n = d2 1
= (21)
z2 S 2
+ N

where n0 = and z is the critical α/2 value from a N (0, 1) distribution.

• Rounding-up the value of n in (21) yields the desired sample size. If this value is < 30, I
recommend adding 2 or 3 to this value to account for the use of the large sample z ∗ in the
previous formulas instead of a smaller sample t∗ .

42
– For example, consider the spatially correlated population in Figure 1. How large a
sample would be required so that ycU = y is within 1 of y U with probability at least
.95 (α = .05)? (Assume S 2 ≈ 18.3)
• If the population size N is very large, then 1/N ≈ 0. In this case, n ≈ n0 . This is the
formula given in introductory statistics books.
• There remains one major problem. This sample size formula assumes that you know the
population variance S 2 . Therefore, to estimate the sample size n, we need a prior estimate
of S 2 . Barnett (1997, pages 33-34) describes 4 ways to do this:

1. A Pilot Study: A small sample size pilot study can be conducted prior to the primary
study to provide an estimate of S 2 .
2. Previous Studies: Other similar studies may have been conducted elsewhere and
appear in the professional journals. Measures of variability from earlier studies may
provide an estimate of S 2 .
3. Double Sampling: A preliminary SRS of size n1 is taken and the sample variance s21
is used to estimate S 2 . Using s21 in (21) will approximate an adequate sample size n.
Then, a further SRS of size n − n1 is taken from the remaining unsampled N − n1
sampling units. This is an example of double sampling.
4. Exploiting the structure of the population: Sometimes we may have some knowledge
of the structure of the population which can provide information about S 2 .
– A common case is when you have count data and it is reasonable to assume the
distribution of counts follows a Poisson distribution. Because the mean and the
variance of a Poisson distribution are the same, all we need is a prior estimate of
the population mean.
– A second case occurs with estimation of a proportion p for a binomial distribution.
If we have a prior estimate of p, we also have a prior estimate of the variance
which is a function of p.

2.6.2 When Estimating t

• Situation: Estimate the SRS size required so the probability that the difference between
the estimator b t = N y and the population total t does not exceed a maximum allowable
difference d is at most α.
• Mathematically, find n such that Pr(|b
t − t| > d) < α for a specified maximum allowable
difference d.
• Assuming N y is approximately
r normally distributed, this is equivalent to finding n so that
S2
the margin of error zα/2 N (N − n) ≤ d. Solving this inequality for n yields
n
1
n = d2 1
= (22)
N 2 z2 S 2
+ N

where n0 = and z is the critical α/2 value from a N (0, 1) distribution.

43
• Rounding-up the value of n in (22) yields the desired sample size. If this value is < 30, I
recommend adding 2 or 3 to this value.
– For example, consider the longleaf pine population in Figure 2. How large a sample
would be required so that b
t is within 15 of t with probability at least .95 (α = .05)?
2
(Assume S ≈ 4)
• If the population size N is very large, then 1/N ≈ 0. In this case, n ≈ n0 .

2.6.3 When Estimating p

• Situation: Estimate the SRS size required so the probability that the difference between
the sample proportion pb and the population proportion p does not exceed a maximum
allowable difference d is at most α.
– For example, consider the longleaf pine presence/absence population in Figure 3.
How large a sample would be required so that pb is within .05 of p with probability at
least .95?
• Mathematically, find n such that Pr(|b
p − p| > d) ≤ α for a specified maximum
allowable difference d.
• Assuming pb is approximately
s normally distributed, this is equivalent to finding n so that

N − n p(1 − p)
the margin of error zα/2 ≤ d.
N −1 n
• Solving this inequality for n yields
N p(1 − p 1
n = 2 = ≈ 1 1 (23)
(N − 1) dz2 + p(1 − p) n0
+ N

where n0 = and z is the critical α/2 value from a N (0, 1) distribution.

• Rounding-up the value of n in (23) yields the desired sample size.

• Because N is typically large when estimating p, it is common to ignore the f.p.c. If you,
the estimated sample size is n ≈ n0 .
• Unfortunately, the sample size formulas assume you know the population proportion p,
the quantity you are trying to estimate. Thus, to estimate an adequate sample size, we
need a prior estimate of p. In addition to the four methods of Barnett (pp 33-34), there
is also the following conservative approach.
q
N −n p(1−p)

• Note that the standard deviation of pb = s.d.(b
p) = N −1 n
is largest when p = 1/2.
Thus, it is conservative to use p = 1/2 in (23) if there is no prior reasonable estimate.
• Example: Consider the longleaf pine presence/absence population in Figure 3. How large
a sample would be required so that pb is within .05 of p with probability at least .95?
(i) Assume we use p ≈ .72 based on the earlier SRS with n = 25.
(ii) Assume we have no prior estimate of p and use the conservative estimate of p = .5.

Machine Platform Crowd Andrew MC PDF
100% (5)
Machine Platform Crowd Andrew MC PDF
351 pages
5 - Ratio Regression and Difference Estimation - Revised
No ratings yet
5 - Ratio Regression and Difference Estimation - Revised
39 pages
Report Flight Dynamic (Mirza) PDF
No ratings yet
Report Flight Dynamic (Mirza) PDF
51 pages
Chemical Engineering Design Project - Potash Production - The Design of A Rod Mill
No ratings yet
Chemical Engineering Design Project - Potash Production - The Design of A Rod Mill
44 pages
Introduction To Probabilistic Sampling
No ratings yet
Introduction To Probabilistic Sampling
39 pages
Lecture 4 Simple Random Sampling
No ratings yet
Lecture 4 Simple Random Sampling
6 pages
Sample Surveys: Rohan, Vijayan
No ratings yet
Sample Surveys: Rohan, Vijayan
72 pages
Lecture 6-3 - Simple Random Sampling
No ratings yet
Lecture 6-3 - Simple Random Sampling
27 pages
BSc Sample Surveys Unit I Part II
No ratings yet
BSc Sample Surveys Unit I Part II
12 pages
LECTURE 11b
No ratings yet
LECTURE 11b
4 pages
2b.-SRS-for-proportion_20.05.221
No ratings yet
2b.-SRS-for-proportion_20.05.221
9 pages
N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR
No ratings yet
N Out of A Finite Population of Size:) (SRSWR) (Srswor) (SRSWR
30 pages
Simple Random Sampling
No ratings yet
Simple Random Sampling
18 pages
UMass Stat 516 Solutions Chapter 8
No ratings yet
UMass Stat 516 Solutions Chapter 8
26 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
Lecture 14 Simple Random Sampling 3
No ratings yet
Lecture 14 Simple Random Sampling 3
15 pages
Principles of Sampling
No ratings yet
Principles of Sampling
20 pages
Stat 3014 Notes 11 Sampling
100% (2)
Stat 3014 Notes 11 Sampling
36 pages
Survey Sampling: Stat 138
No ratings yet
Survey Sampling: Stat 138
8 pages
Sp Sampling Lect 8
No ratings yet
Sp Sampling Lect 8
27 pages
Simple Random Sampling
No ratings yet
Simple Random Sampling
10 pages
Sur15 1 Sol
No ratings yet
Sur15 1 Sol
17 pages
EIE2001 Lecture 6b Week 7
No ratings yet
EIE2001 Lecture 6b Week 7
10 pages
STAT8101_L2_25
No ratings yet
STAT8101_L2_25
62 pages
Ss Notes
No ratings yet
Ss Notes
34 pages
SMFDA
No ratings yet
SMFDA
45 pages
notes on sample survey
No ratings yet
notes on sample survey
34 pages
Paper 4
No ratings yet
Paper 4
21 pages
Formalizing The Concepts: Simple Random Sampling: Juan Muñoz Kristen Himelein March 2013
No ratings yet
Formalizing The Concepts: Simple Random Sampling: Juan Muñoz Kristen Himelein March 2013
25 pages
STA248
No ratings yet
STA248
26 pages
Module 4 (301 SI-2) (1)
No ratings yet
Module 4 (301 SI-2) (1)
24 pages
ND Vohra Ch10 Theory of Estimation
No ratings yet
ND Vohra Ch10 Theory of Estimation
37 pages
Cluster Sampling
No ratings yet
Cluster Sampling
18 pages
IS 3001 SRS Part 1s - Lecture 3
No ratings yet
IS 3001 SRS Part 1s - Lecture 3
27 pages
MPC PDF
No ratings yet
MPC PDF
12 pages
Sampling Theory: Double Sampling (Two Phase Sampling)
No ratings yet
Sampling Theory: Double Sampling (Two Phase Sampling)
12 pages
Chapter 6. Estiamation
No ratings yet
Chapter 6. Estiamation
65 pages
webMATH236_Lecture5(1)
No ratings yet
webMATH236_Lecture5(1)
87 pages
Sampling Unit 6
No ratings yet
Sampling Unit 6
5 pages
Week9 BAM
No ratings yet
Week9 BAM
26 pages
Formula Help Sheet
No ratings yet
Formula Help Sheet
6 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
ESA- QP_UE19-20CS203_SDS
No ratings yet
ESA- QP_UE19-20CS203_SDS
11 pages
Creative Commons Attribution-Noncommercial-Sharealike License
No ratings yet
Creative Commons Attribution-Noncommercial-Sharealike License
50 pages
slidesc53_1_2 statistics
No ratings yet
slidesc53_1_2 statistics
27 pages
UNIT-3 (ESTIMATION)
No ratings yet
UNIT-3 (ESTIMATION)
16 pages
RM Note Unit - 4
No ratings yet
RM Note Unit - 4
21 pages
课本附录 (二) - 公式表 Formula Sheet - final
No ratings yet
课本附录 (二) - 公式表 Formula Sheet - final
2 pages
Flipped Notes 7 Estimation
No ratings yet
Flipped Notes 7 Estimation
36 pages
Unit 5
No ratings yet
Unit 5
49 pages
Tutorial 5 v2
No ratings yet
Tutorial 5 v2
11 pages
1 Preliminaries: 1.1 Motivation
No ratings yet
1 Preliminaries: 1.1 Motivation
7 pages
Statistical Inference Point Estimators Estimating The Population Mean Using Confidence Intervals
No ratings yet
Statistical Inference Point Estimators Estimating The Population Mean Using Confidence Intervals
40 pages
Research Methodology and Biostatistics Part II 2
No ratings yet
Research Methodology and Biostatistics Part II 2
45 pages
P&S Unit 4 total
No ratings yet
P&S Unit 4 total
39 pages
Formuleblad-statistiek
No ratings yet
Formuleblad-statistiek
10 pages
Estimation of The Population Average
No ratings yet
Estimation of The Population Average
9 pages
Unit - Iv Sampling
No ratings yet
Unit - Iv Sampling
14 pages
5th Chap 1st
No ratings yet
5th Chap 1st
6 pages
Stat 475 Notes 8: y B X y B X y BX N SEB NNX N X Is Unknown, Then We Substitute The Sample Mean X For It
No ratings yet
Stat 475 Notes 8: y B X y B X y BX N SEB NNX N X Is Unknown, Then We Substitute The Sample Mean X For It
13 pages
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Elements of Partial Differential Equations
From Everand
Elements of Partial Differential Equations
Ian N. Sneddon
4.5/5 (14)
Circles (Geometry) Mathematics Question Bank
From Everand
Circles (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Inventory Management With Quantity Discounts
No ratings yet
Inventory Management With Quantity Discounts
6 pages
Solov Model
No ratings yet
Solov Model
8 pages
CF REM Solutions
No ratings yet
CF REM Solutions
19 pages
Tutorial 2
No ratings yet
Tutorial 2
17 pages
MEP Is LM Equations
No ratings yet
MEP Is LM Equations
12 pages
Solov Model
No ratings yet
Solov Model
8 pages
National Cranberry Cooperative: Operations Management-SCH-MGMT-670
No ratings yet
National Cranberry Cooperative: Operations Management-SCH-MGMT-670
8 pages
Joint Distributions, Independence Class 7, 18.05 Jeremy Orloff and Jonathan Bloom
No ratings yet
Joint Distributions, Independence Class 7, 18.05 Jeremy Orloff and Jonathan Bloom
11 pages
: µ = 0 vs H: µ 6= 0. Previous work shows that σ = 2. A change in BMI of 1.5 is considered important to detect (if the true effect size is 1.5 or higher
No ratings yet
: µ = 0 vs H: µ 6= 0. Previous work shows that σ = 2. A change in BMI of 1.5 is considered important to detect (if the true effect size is 1.5 or higher
5 pages
RA Role Requires
No ratings yet
RA Role Requires
1 page
Mobile Commerce: Unique Features of M-Commerce
No ratings yet
Mobile Commerce: Unique Features of M-Commerce
12 pages
APProximation Problem
No ratings yet
APProximation Problem
25 pages
Effect of Raw Material On Yarn Quality and Its Yeild %
100% (2)
Effect of Raw Material On Yarn Quality and Its Yeild %
90 pages
Narrative Text
No ratings yet
Narrative Text
12 pages
Vetoryl Package Insert
No ratings yet
Vetoryl Package Insert
1 page
The Home Science CookBook
No ratings yet
The Home Science CookBook
292 pages
A Teacher's Guide For Typewriter Maintenance Emphasizing The - Clopper, Henry Eckert
No ratings yet
A Teacher's Guide For Typewriter Maintenance Emphasizing The - Clopper, Henry Eckert
95 pages
Catalog OIL HEATERS Motor Grupuri IPL-109web PDF
No ratings yet
Catalog OIL HEATERS Motor Grupuri IPL-109web PDF
16 pages
Cricket Schedules, International Cricket Calendar - ESPN Cricinfo
No ratings yet
Cricket Schedules, International Cricket Calendar - ESPN Cricinfo
11 pages
Walmart Case
No ratings yet
Walmart Case
22 pages
Pulleys - Sole Plates - Standard Sole Plate Data
0% (1)
Pulleys - Sole Plates - Standard Sole Plate Data
3 pages
Instant ebooks textbook Formulas and Calculations for Drilling Production and Workover 5th Edition William C. Lyons download all chapters
100% (1)
Instant ebooks textbook Formulas and Calculations for Drilling Production and Workover 5th Edition William C. Lyons download all chapters
55 pages
Phast Release Notes
No ratings yet
Phast Release Notes
9 pages
Penn Mutual Case Study Final
No ratings yet
Penn Mutual Case Study Final
6 pages
Modern Physics for Scientists and Engineers 5th Edition Stephen Thornton & Andrew Rex instant download
No ratings yet
Modern Physics for Scientists and Engineers 5th Edition Stephen Thornton & Andrew Rex instant download
40 pages
srx300 Datasheet
No ratings yet
srx300 Datasheet
4 pages
PPL (A) : Part-FCL Question Bank
No ratings yet
PPL (A) : Part-FCL Question Bank
37 pages
L7-Traumatic Brain Injury
No ratings yet
L7-Traumatic Brain Injury
58 pages
Single Wide Master Catalog: Goss Community Unit Goss Community Folder
100% (1)
Single Wide Master Catalog: Goss Community Unit Goss Community Folder
109 pages
Alvarado 2017
No ratings yet
Alvarado 2017
12 pages
Public Diplomacy Literature Review
100% (2)
Public Diplomacy Literature Review
8 pages
B2.2 - Unit 3 - Practice Quiz Bai Lam
No ratings yet
B2.2 - Unit 3 - Practice Quiz Bai Lam
4 pages
AllHome Corporation
No ratings yet
AllHome Corporation
3 pages
Fundamental Unit of Life - Class 9 Notes Padhle
No ratings yet
Fundamental Unit of Life - Class 9 Notes Padhle
14 pages
Human Resource Management PDF
No ratings yet
Human Resource Management PDF
44 pages
Treatment of Pediatric Overweight and Obesity Position of the Academy of Nutrition and Dietetics Based on an Umbrella Review of Systematic Reviews
No ratings yet
Treatment of Pediatric Overweight and Obesity Position of the Academy of Nutrition and Dietetics Based on an Umbrella Review of Systematic Reviews
14 pages
Schmitt Trigger
No ratings yet
Schmitt Trigger
4 pages
Excel 2013 Basic Quick Reference PDF
No ratings yet
Excel 2013 Basic Quick Reference PDF
3 pages
TCPIP
No ratings yet
TCPIP
2 pages
Asad Notes
No ratings yet
Asad Notes
15 pages

Sample

Uploaded by

Sample

Uploaded by

2.

3 Simple Random Sampling

• Some necessary combinatorial notation:

– (n factorial) n! = n × (n − 1) × (n − 2) × · · · × 2 × 1. This is the number of

• There are Nn possible SRSs of size n selected from a population of size N .

• For any SRS of size n from a population of size N , we have P (S) = 1/ Nn .

• Unless otherwise specified, we will assume sampling is without replacement.

2.3.1 Estimation of y U and t

Demonstration of Unbiasedness: Suppose we have a population consisting of five y-values:

which has the following parameters:

The averages for estimators yc b2 2

• Warning: In an introductory statistics course, you were told that the

• Because S 2 is unknown, we use s2 to get unbiased estimators of the variances in (11)::

Sample Units y-values s2 Vb (yc

2.4 Two-Sided Confidence Intervals for y U and t

For y U : For t : (16)

2.4.1 One-Sided Confidence Intervals for y U and t

where t∗ is the upper α critical value from the t(n − 1) distribution.

SRS Example with Strong Spatial Correlation

Data Exhibiting Strong Spatial Correlation

Longleaf Pine Data (Rathbun and Cressie 1994)

REFERENCES (for Figure 2 data)

# t-based confidence intervals for SRS in Figure 1

srs_design <- svydesign(id=~1, fpc=~fpc, data=srsdat)

esttotal <- svytotal(~count,srs_design)

estmean <- svymean(~count,srs_design)

# t-based confidence intervals for SRS in Figure 2

srs_design <- svydesign(id=~1, fpc=~fpc, data=srsdat)

esttotal <- svytotal(~count,srs_design)

estmean <- svymean(~count,srs_design)

R output for t-based confidence interval for SRS

The data file:

Two-Tailed CI for count where alpha = 0.05 with 19 df

Simple Random Sample -- Example 1

The SURVEYMEANS Procedure

Variable Sum Std Dev 95% CL for Sum

Simple Random Sample -- Example 2

The SURVEYMEANS Procedure

Variable Sum Std Dev 95% CL for Sum

• Therefore, the variance of pb is

• The square root of V (b

• The square root of Vb (b

A simple random sample of size n = 25

• Although confidence interval calculations can be based on probability tables of hyper-

• Thus, if the sample size n is large enough, we use Vb (b

• The normal approximation will be reasonable given

1. n is not too large relative to t or N − t. This will be a problem if p is close to 0 or

# t-based confidence intervals for SRS in Figure 3

srs_design <- svydesign(id=~1, fpc=~fpc, data=srsdat)

estmean <- svymean(~presence,srs_design)

PROC SURVEYMEANS DATA=SRS_Fig3 TOTAL = 400 ALPHA = .10;

Simple Random Sample -- Figure 3

The SURVEYMEANS Procedure

Class Level Information

• To determine a sample size n when estimating a parameter θ, we do the following:

• This is equivalent to finding n large enough so that the margin of error

2.6.1 When Estimating y U

• Mathematically, find n such that Pr(|yc

where n0 = and z is the critical α/2 value from a N (0, 1) distribution.

2.6.2 When Estimating t

where n0 = and z is the critical α/2 value from a N (0, 1) distribution.

2.6.3 When Estimating p

where n0 = and z is the critical α/2 value from a N (0, 1) distribution.

• Rounding-up the value of n in (23) yields the desired sample size.

You might also like