9 Sample Design
9 Sample Design
Sampling
Sample Design
Population: the totality of elements
under study.
Elements: the persons, households,
etc. we intend to study.
Contact Sample: the number of
elements we intend to contact to
participate in the study
Sample: the number of elements
actually included in the study
Why Sample?
It is seldom possible to survey the whole
population.
1. We cant identify the full sampling frame.
2. Elements in sampling frame may be
inaccessible.
3. Very costly to survey everyone.
4. Immense time and labor needed to survey
everyone.
Systematic Sampling
Selecting every kth number in your
sampling frame after a random
starting point.
Systematic Sampling
The probability of an individual case being
selected remains the same across the sample.
But the probability of sets of cases being
selected is variable.
E.g., if you pick every tenth case, the 1st, 11th, etc.
have a 1/10 chance of selection. But the 2nd, 12th,
etc. (and all other cases) have a 0/10 chance.
Stratified Sampling
The classification of the population into
subgroups (strata) based on supplementary
information, and the selection of random
samples from those strata.
Supplementary information usually means
census data (US Census, or any complete
population statistics, like HR data at workplace).
E.g., if we know 60% of our population are
women, and we want to contact 1,000 people:
Use SRS to select 600 women.
Use SRS to select 400 men.
Stratified Sampling
Selecting strata: in a small sample, you cannot
guarantee that all strata are selected in such a
way that sample is representative.
Select 1-3 key variables for which it is most
important that sample be representative.
Identify appropriate sample sizes for strata:
Do you have populations breakdowns for all
combinations of strata?
If you use more than one var (say, ethnicity and gender),
youll need to see if your sampling frame matches not
only population figure for Asians and population figure for
males, but population figure for Asian males.
Stratified Sampling
Proportionate Stratified Sampling: researchers
have set the size of each strata to match the
same proportion as in the population.
E.g., 12.5% of the US population is African
American, & 12.5% of your sample is African
American as well.
Stratified Sampling
Uses of Disproportionate Sampling:
Neyman allocation: some strata have
higher variance than others; and
minimizing high strata results in an
optimal sample.
When response rates of certain strata
are anticipated to be low. We might
include more of that strata to attain a
proportionate result.
When we need a sufficient number of
members of a small group to study them
Nonprobability Sampling
Methods
Sampling methods where the sample is
selected from the population without any
systematic plan to maximize randomness or
representativeness.
Nonprobability samples are usually used when
there is limited time and/or other resources.
Nonprobability samples result in a poor
representation of the population. They are
best used for exploratory research, focus
group selection, or observational research.
Nonprobability: Convenience
Sampling
The sample is drawn at the convenience of
the interviewer; for example, friends, family,
or people who happen to arrive at a busy
intersection or mall.
This is usually the easiest and cheapest
method of sample selection.
Sample error occurs, favoring inclusion of
people who happen to be the researchers
acquaintance, or who happen to frequent the
location where interviews are taking place.
Nonprobability: Referral
Sampling
Also known as Snowball Sampling.
Initial sample is drawn using some other sampling
method. Participants are then contacted by the
researcher to see if friends or acquaintances
would be interesting in participating.
This is a common sampling method when
studying hard-to-find populations, such as gang
members or wealthy elites.
This method intentionally overselects from a
target population, and therefore is not random nor
representative.
Nonprobability: Quota
Sampling
The sample is drawn by:
Soliciting participation of the first available, as
in convenience sampling;
Key demographic subgroups are identified;
When population proportion of each subgroup
are achieved, no new respondents from that
subgroup are included.
Nonprobability: Quota
Sampling
Compare to Stratified Sampling:
Both attempt to limit the number of
participants from each category in order
to be representative of the population
breakdown.
Quota sampling is not random in its
selection of participants.
Quota sampling might be used when
resources are too limited for a random
sample, but a representative sample is
desired.
Where:
Z is the z-score for the confidence interval
(typically 95% 1.96)
e is the maximum probability of error you are
willing to accept, (typically 5% 0.05)
p and q are known probabilities in the
population of some key characteristic. (p is
prob if true; q is 1-p)
Ex: we know that 65% of our population is
female.
(1.96^2)*(0.65)*(0.35) = 0.873964
(0.05^2) = .0025
0.873964 * .0025 = 349.59 (that is, 350).
Selection Bias
Error resulting from how we select elements or groups
for inclusion in the sample. Examples:
Sample frame error: inclusion in the sampling frame is
nonrandom.
Data collection error: the procedure for collecting data
favors one important group over another.
Self-selection: those who want to participate may be
different in some key way from those who do not.
Indication bias: when inclusion depends on indication, e.g. a
treatment is given to people in high risk of acquiring a
disease, potentially causing a preponderance of treated
people among those acquiring the disease.
Selecting end-points of a series. For example, to maximize
a claimed trend, you could start the time series at an
unusually low year, and end on a high one.