EIE2001 Lecture 6b Week 7
EIE2001 Lecture 6b Week 7
Introduction
SRS is a type of probability sampling in which the units composing a population are
assigned numbers. A set of random numbers is then generated, and the units having those
numbers are included in the sample.
SRS is the basis for sampling theories. It is used as an approximation to the more complex
sampling designs.
SRS is suitable for selecting small sample from a small population in a small geographical
area and where sampling frame is available. Each member of the population has an equal
chance of being selected as subject. It is not suitable for very large population where the
elements are spread out in a wide geographical area.
- Must have sampling frame to apply the probability sampling design.
SRS is widely used with other complex sampling designs such as stratified random
sampling and cluster sampling. In stratified random sampling, SRS is used to select
elements from each stratum. SRS is also used to select clusters and elements within
selected clusters in multistage sampling.
Proportion p=µ ˆp y
of the population mean
yi
ˆ y i 1
(4.1)
n
Estimated variance of y
n s2
Vˆ y 1 (4.2)
N n
where
n
(yi y)2
s2 i 1
n 1
n s2
2 Vˆ y 2 1 (4.3)
N n
n
1 is known as the finite population correction factor (fpc). It is a correction factor
N
which, when multipled by the with-replacement variance, gives the without-replacement
variance.
Example:
2
Estimator
SRS of 200 accounts from a total of 1,000 accounts. The sample mean y = RM94.22 and
sample variance s2 = 445.21. Estimate (population mean), margin (bound) on error of
estimation and 95% confidence interval. y = 94.22, so E y = = RM 94.22
95% confidence interval for population mean is RM 94.22 2.67 = (91.55, 96.89)
of the population total
N yi
ˆ Ny i 1
(4.4)
n
Estimated variance of ˆ
n s
Vˆ ˆ Vˆ Ny N2 1 N n2
(4.5)
2 Vˆ Ny 2 N2 1 Nn sn2
(4.6)
3
Estimator
Example:
An industry wishes to estimate the total time spent by scientists in doing the trivial jobs. A
sample survey on 50 scientists selected with SRS shows that they spent an average 10.31
hours on such jobs, with a variance of 2.25 hours. If the industry has 750 scientists, estimate
the total amount of time spent by scientists on trivial jobs and 95% confidence interval.
yi
pˆ y i 1
(4.14)
n
Estimated variance of ˆp
n ˆpqˆ
Vˆ ˆp 1 (4.15)
N n 1
where
4
Estimator
qˆ 1 ˆp
ˆ n ˆpqˆ
2V ˆp 2 1 (4.16)
N n 1
Note: Proportion is a special case of the mean, with a dichotomous outcome (0 and 1). The
variance for the mean is s2, the variance for proportion is pq. The variance for the sampling
distribution of the mean is s2/n, while the variance for the sampling distribution of
proportion is pq/(n-1)
Example:
N = 300 students, n = 100
Estimate the proportion of students who are required to take supplementary examination, with
code “1” if “yes” and “0” if “no”.
Student yi yi 15
ˆp = 0.15 = 15% n
1 1
2 0 100
3 0
4 0 Vˆ ˆp 1 100 0.15 0.85 = 0.000859
… … 300 99
98 1
Margin of error or bound of error = 2 Vˆ ˆp = 0.059
99 0
100 1
95% confidence interval: 0.15 0.059 = (0.091, 0.209) or 9.1% and 20.9%
Total yi = 15
at 95% confidence level. This is a rather wide margin and it is not accurate,
because the sample size is relatively small. With a bigger sample size, the margin of error can be
reduced, and the estimate will be more accurate (with a smaller margin of error).
5
Determining Sample Size with a Bound on the Error of Estimation, B
A sample size that is too big involves high cost and longer time to collect the data. On the
other hand, if the sample size is too small, the estimates are not accurate. It is therefore
necessary to determine the sample size for a fixed cost and margin of error.
B 2V y
Vˆ y 1
s2
n
N n
V(y) 2
N
n n N
1
N 2
n (N 1)D 2 (4.11)
where
B2
D
4
Note: 2 is usually estimated with the sample variance, s2 from past studies. can also be
estimated as follows:
≈ range/4 where range is the maximum value minus the
minimum value.
6
Example:
The officer of a hospital wants to estimate the average outstanding bill. His record shows
that most of the outstanding accounts are within the range of RM 100. If there are 1,000
accounts, how large should the sample size be to estimate the mean ( ), with a margin of
error B = RM 3?
≈ 100/4 = 25
2
≈ 252 = 625
D = 32/4 = 2.25
n= 217.56
B 2 V Ny 2N V y
N 2
n (N 2
(4.13)
1)D
where
7
D 4BN22
Example:
A researcher wants to estimate the increase in the total weight gain of 1,000 chickens that
were given special feed for 0 to 4 weeks. Past studies show that is 6 gram. Determine
the sample size required to achieve a bound of error B = 1,000 gram.
10002 1000 62
D= 4 10002 0.25 n= (1000 1)0.25 62 =
125.98
The researcher needs to weigh n = 126 chicks to estimate , the total weight gain for N =
1,000 chickens in 0 to 4 weeks.
Sample size required to estimate population proportion p
Npq N 1 D pq
n (4.18)
where
q=1–p
B2
D
4
Example:
The management of a company with 2,000 workers wants to conduct a survey to study the
proportion of workers who are in favor of the new salary scheme. Determine the sample
size to estimate the population proportion p, that supports the new scheme, with a bound
of error B = 0.05. Let’s say the proportion supporting the new scheme is unknown (we then
use p = 0.5 which produces the maximum variance).
0.052
D= 0.000625
4
8
n= 333.47
334 workers must be interviewed to estimate the proportion of workers who favor the
proposed new salary scheme.
Theory
A larger sample size produces a more accurate estimate of the parameter. The accuracy of
the estimate also depends on the variability, s. The smaller the s, the smaller the standard
error.
The sampling distribution is a probability distribution of samples, such as the sample mean
for all possible random sample of size n from the population. The standard error is the
standard deviation of the sampling distribution.
Exercise:
Select 5% of the respondents from “Employee Satisfaction Survey” data and calculate:
1. Mean respondent income and 95% confidence interval.
2. Total income of all respondents (based on the sample) and 95% confidence interval.
3. Percent who are males and 95% confidence interval.
9
Data-select cases
10