Lecture 4
Lecture 4
1
Introduction
Researchers often use sample survey methodology to obtain information
about a larger population by selecting and measuring a sample from that
population.
Since population is too large, we rely on the information collected from the
sample mainly for cost minimization.
Inferences about the population are based on the information from the
sample drawn from that population.
Introduction…
A sample is a collection of individuals selected from a larger population.
Chance of bias
Errors in sampling
1) Sampling error: Errors introduced due to problems in the selection of a sample.
They cannot be avoided or totally eliminated but can be reduced.
2) Non-sampling error:
Observational error
Respondent error
Sample(study)
Sampling Methods
Two broad divisions:
Every sampling unit has a known and non-zero probability of selection into
the sample.
Involves the selection of a sample from a population, based on chance.
Probability sampling is:
More complex,
More time-consuming and
Usually more costly than non-probability sampling.
However, because study samples are randomly selected and their
probability of inclusion can be calculated:
reliable estimates can be produced and
inferences can be made about the population.
The required number of individuals are selected at random from the sampling frame, a
list or a database of all individuals in the population.
Each member of a population has an equal chance of being included in the sample.
To use a SRS method:
Make a numbered list of all the units in the population
Each unit should be numbered from 1 to N (where N is the size of the population)
2. Determine the sampling interval (K) by dividing the number of units in the
population by the desired sample size.
3. Select a number between one and K at random. This number is called the random
start and would be the first number included in your sample.
Note: Systematic sampling should not be used when a cyclic repetition is inherent in the
sampling frame.
Example
To select a sample of 100 from a population of 400, you would need a sampling
interval of 400 ÷ 100 = 4. Therefore, K = 4.
You will need to select one unit out of every four units to end up with a total of 100
units in your sample.
Select a number between 1 and 4 from a table of random numbers.
If you choose 3, the third unit on your frame would be the first unit included in your
sample
The sample might consist of the following units to make up a sample of 100: 3 (the
random start), 7, 11, 15, 19...395, 399 (up to N, which is 400 in this case).
3. Stratified random sampling
It is done when the population is known to be have heterogeneity with regard to some
factors and those factors are used for stratification
Using stratified sampling, the population is divided into homogeneous, mutually
exclusive groups called strata, and
A population can be stratified by any variable that is available for all units prior to
sampling (e.g., age, sex, province of residence, income, etc.)
A separate sample is taken independently from each stratum.
Any of the sampling methods mentioned in this section can be used to sample within
each stratum.
If you create strata within which units share similar characteristics (e.g.,
income) and are considerably different from units in other strata
(e.g., occupation, type of dwelling) then you would only need a small sample
from each stratum to get a precise estimate of total income for that stratum.
Then you could combine these estimates to get a precise estimate of total
income for the whole population.
If you use a SRS approach in the whole population without stratification, the
sample would need to be larger than the total of all stratum samples to get an
estimate of total income with the same level of precision.
Stratified sampling ensures an adequate sample size for sub-groups in the
population of interest.
When a population is stratified, each stratum becomes an independent
population and you will need to decide the sample size for each stratum.
Allocation of sample size to stratum
Proportionate allocation
n
nj N j
N
• Village A B C D Total
• HHs 100 150 120 130 500
• S. size ? ? ? ? 60
4. Cluster sampling
Sometimes it is too expensive to carry out Simple RS
Population may be large and scattered.
Complete list of the study population unavailable
Travel costs can become expensive if interviewers have to survey people
from one end of the country to the other.
Cluster sampling is the most widely used to reduce the cost
In a school based study, we assume students of the same school are
homogeneous.
We can select randomly sections and include all students of the selected
sections only
Main advantage is Cost reduction
5. Multi-stage sampling
Similar to the cluster sampling, except that it involves picking a sample from
within each chosen cluster, rather than including all units in the cluster.
This type of sampling requires at least two stages.
The primary sampling unit (PSU) is the sampling unit in the first sampling
stage.
The secondary sampling unit (SSU) is the sampling unit in the second
sampling stage, etc.
Woreda PSU
Kebele SSU
Sub-Kebele TSU
HH
In the first stage, large groups or clusters are identified and selected.
These clusters contain more population units than are needed for the final
sample.
In the second stage, population units are picked from within the selected
clusters (using any of the possible probability sampling methods) for a final
sample.
If more than two stages are used, the process of choosing population units
within clusters continues until there is a final sample.
B. Non-probability sampling
In non-probability sampling, every item has an unknown chance of being
selected.
In non-probability sampling, there is an assumption that there is an even
distribution of a characteristic of interest within the population.
This is what makes the researcher believe that any sample would be
representative and because of that, results will be accurate.
For probability sampling, random is a feature of the selection process, rather
than an assumption about the structure of the population.
In non-probability sampling, since elements are chosen arbitrarily, there is no
way to estimate the probability of any one element being included in the
sample.
Also, no assurance is given that each item has a chance of being included
2. Volunteer sampling
3. Judgment sampling
4. Quota sampling
35
Introduction
Parameter: Population characteristics or descriptive measure taken
from the population e.g. μ, σ, P etc.
Sample statistic: Any quantity computed from values in a sample e.g.
,sample proportion etc.
The value of population parameters are fixed.
36
Introduction…
A sampling distribution is a distribution of all possible values of a
statistic computed from samples of the same size randomly selected
from the same population.
Serves to answer probability questions about sample statistics
38
A. Sampling distribution of sample mean
Properties of sampling distribution of mean
40
SE is determined by both the sample size and the degree of
variability among the individual observations
SD quantifies the amount of variability among individuals in a
population, while
SE quantifies the variability among means of repeated samples
drawn from that population
The SE is always smaller than the SD (except when n = 1)
41
Central Limit Theorem
The central limit theorem states that if you have a population with mean μ
and standard deviation σ then the distribution of the sample means will be
approximately normally distributed provided the sample size is sufficiently
large (usually n > 30).
If the population is normal, then the theorem holds true even for samples
smaller than 30.
For the population proportions, provided that (np, n(1-p))> 5, where n is the
sample size and p is the probability of success in the population.
42
So we can use the normal probability model to quantify uncertainty
when making inferences about a population mean based on the
sample mean.
When the sampling is done from a non-normally distributed
population, the central limit theorem is used.
The larger the sample size, the better will be the normal
approximation to the sampling distribution of the mean.
43
Applications of the sampling distributions of
sample mean
Helps in computing the probability of obtaining a sample with a
mean of some specified magnitude.
z-value for sampling distribution of x
(x μ)
z
σ
n
45
Solution:
with mean μx = 8
and σ 3
σx 0.5
n 36
46
7.8 - 8 μx -μ 8.2 - 8
P(7.8 μ x 8.2) P
3 σ 3
36 n 36
P(-0.4 z 0.4) 0.3108
-0.4 0.4
x 7.8
μx 8
8.2
x μz 0 z
47
B. Distribution of the sample
proportion
The sample proportion is derived from counts or frequency data.
Sample proportion =
Population proportion = p or π
48
Population proportion (p) = the proportion of population having some
characteristic
49
Properties of the sampling distribution of
sample proportion
Construction of the sampling distribution of the sample proportion is done
in a manner similar to that of the mean.
Applying the central limit theorem, the shape of the sampling distribution is
approximately normal provided that n is large enough.
The mean of the distribution, μp, will be equal to the true population
proportion, p, and the variance of the distribution, σp2 will be equal to
p(q)/n.
50
How large does n need to be?
Central limit theorem for proportions:
np 5
n(1 p) 5
51
z-Value for Proportions
Standardize p to a z value with the formula:
p p p p
z
σp p(1 p)
n
52
Example
According to a recent estimate, 19.4% of the adult male population was obese. What is
the probability that in a random sample of size 150 from this population fewer than 15%
will be obese?
nq=150 *0.806=120.9>5
53
Find the z score
54
THANKS FOR YOUR ATTENTION!!!!
55