0% found this document useful (0 votes)
15 views

9 Sample Design

This document provides an overview of key concepts in survey sampling, including different sampling methods and their appropriate uses. It discusses population, elements, and sampling frames. Common sampling methods described are simple random sampling, systematic sampling, stratified sampling, cluster sampling, and nonprobability sampling techniques like convenience and quota sampling. The document emphasizes that the goal is to select a sample that minimizes bias and error while being feasible given resources. It also provides guidance on developing a sample plan that determines the population, sampling frame, method, and size.

Uploaded by

cesar suarez
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

9 Sample Design

This document provides an overview of key concepts in survey sampling, including different sampling methods and their appropriate uses. It discusses population, elements, and sampling frames. Common sampling methods described are simple random sampling, systematic sampling, stratified sampling, cluster sampling, and nonprobability sampling techniques like convenience and quota sampling. The document emphasizes that the goal is to select a sample that minimizes bias and error while being feasible given resources. It also provides guidance on developing a sample plan that determines the population, sampling frame, method, and size.

Uploaded by

cesar suarez
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Introduction to Survey

Sampling

Sample Design
Population: the totality of elements
under study.
Elements: the persons, households,
etc. we intend to study.
Contact Sample: the number of
elements we intend to contact to
participate in the study
Sample: the number of elements
actually included in the study

Why Sample?
It is seldom possible to survey the whole
population.
1. We cant identify the full sampling frame.
2. Elements in sampling frame may be
inaccessible.
3. Very costly to survey everyone.
4. Immense time and labor needed to survey
everyone.

Sampling frame: list of all available


population elements, from which the
sampled elements may be selected.

A Good Sample Method Reduces


Sampling Error
Sampling Error: Inaccuracy in results of
the survey due to the sample. Types:
1. Selection Bias: sampling error due to the
method of sample selection.
2. Sample Size Error: sampling error due to
insufficient sample size.
3. Nonresponse Error
Your approach to sampling should help to
minimize each.

The Sample Plan


Key Steps:

1. Determine your population.


a) Unit of Analysis: the type of sample you have (people,
places, things, states, countries, counties, colleges,
businesses).

2. Create a sampling frame, which is a master list of the


entire population.
3. Select an appropriate sampling method.
4. Determine desired sample size.
5. Administer survey.
After the Survey:

1. Review sample for representativeness, completeness.


2. Corrective actions: oversampling, resampling,
weighting.

Simple Random Sampling


Simple random sampling (SRS): a
sampling scheme where the
probability of selecting any distinct
element (or subset of elements) from
population is exactly equal.
Assignment methods: Lottery, table of
random numbers, SPSS random
number generator.
Every case in the sampling frame is
assigned a number by this randomized

Simple Random Sampling-Some


Terminology
Estimate: since we are working with a sample,
we do not have the true population parameteronly an estimate.
Estimator: the rule/procedure for obtaining the
estimate; i.e., the sample statistic.
Sampling distribution: the distribution of means
obtained in repeated sampling.
Standard error: the degree of variation in the
sampling distribution (the higher the number,
the more variation from the summary statistic).

Simple Random Sample


A true SRS guarantees that all elements in
population have an equal chance of being selected.
It does not guarantee that the sample is
representative. Any random draw could contain an
over- or underrepresentation of a key group.
As the sample size gets larger, and when population
variance is smaller, randomly selected samples will
tend to be more representative.
I.e., in practice, an estimator will be more accurate
when sample size is high and population error is low.

Systematic Sampling
Selecting every kth number in your
sampling frame after a random
starting point.

If there is a fraction, round down.


Where SRS involves creation of a
new selection order, Systematic
keeps the sampling frame order
intact.

Systematic Sampling
The probability of an individual case being
selected remains the same across the sample.
But the probability of sets of cases being
selected is variable.
E.g., if you pick every tenth case, the 1st, 11th, etc.
have a 1/10 chance of selection. But the 2nd, 12th,
etc. (and all other cases) have a 0/10 chance.

As a result, if there is any non-random basis


for sample frame order, it can bias the
sample.

Stratified Sampling
The classification of the population into
subgroups (strata) based on supplementary
information, and the selection of random
samples from those strata.
Supplementary information usually means
census data (US Census, or any complete
population statistics, like HR data at workplace).
E.g., if we know 60% of our population are
women, and we want to contact 1,000 people:
Use SRS to select 600 women.
Use SRS to select 400 men.

Stratified Sampling
Selecting strata: in a small sample, you cannot
guarantee that all strata are selected in such a
way that sample is representative.
Select 1-3 key variables for which it is most
important that sample be representative.
Identify appropriate sample sizes for strata:
Do you have populations breakdowns for all
combinations of strata?
If you use more than one var (say, ethnicity and gender),
youll need to see if your sampling frame matches not
only population figure for Asians and population figure for
males, but population figure for Asian males.

Do you have information for the correct population?

Stratified Sampling
Proportionate Stratified Sampling: researchers
have set the size of each strata to match the
same proportion as in the population.
E.g., 12.5% of the US population is African
American, & 12.5% of your sample is African
American as well.

Disproportionate Stratified Sampling:


researchers have set the size of each strata to
differ from proportions in the population.
E.g., 1% of the US population is Native American,
but youve decided to make 5% of your sample
Native American.

Stratified Sampling
Uses of Disproportionate Sampling:
Neyman allocation: some strata have
higher variance than others; and
minimizing high strata results in an
optimal sample.
When response rates of certain strata
are anticipated to be low. We might
include more of that strata to attain a
proportionate result.
When we need a sufficient number of
members of a small group to study them

Cluster and Multistage


Sampling
Populations are often composed of
groupings of elements: i.e., cities or
districts; corporations; colleges; clubs.
Differences between these groupings
(e.g., attending a large, public
university vs. a small, private college)
may be important to research.
Cluster sampling: we treat these
groupings as strata, and use SRS to
select a sample from these diverse

Cluster and Multistage


Sampling
If a sample is taken from each of a group of
equivalent clusters, this is called two-stage
cluster sampling.
If there are multiple levels, we call it three-stage,
four-stage, etc.
For ex: in a study of NE US cities, we select Boston,
NYC, Philadelphia, Baltimore, and Washington.
Within each city, we take a select five neighborhoods,
reflecting different socioeconomic statuses.
Then, we use SRS to select 100 people from each
neighborhood, so that we have 5*5*100=2500 cases.
This would be a three-stage sample.

Cluster and Multistage


Sampling
Primary Sampling Unit (PSU) refers to
the cluster level at which the SRS is
drawn.
PSUs may be of different sizes, so
elements in small PSUs technically
have a better chance of being
included than those in large clusters.
We may adjust for this by taking
proportionate samples from each PSU.
As in stratified sampling, we may

Nonprobability Sampling
Methods
Sampling methods where the sample is
selected from the population without any
systematic plan to maximize randomness or
representativeness.
Nonprobability samples are usually used when
there is limited time and/or other resources.
Nonprobability samples result in a poor
representation of the population. They are
best used for exploratory research, focus
group selection, or observational research.

Nonprobability: Convenience
Sampling
The sample is drawn at the convenience of
the interviewer; for example, friends, family,
or people who happen to arrive at a busy
intersection or mall.
This is usually the easiest and cheapest
method of sample selection.
Sample error occurs, favoring inclusion of
people who happen to be the researchers
acquaintance, or who happen to frequent the
location where interviews are taking place.

Nonprobability: Judgment Sampling


The sample is drawn following the same general
approach as a convenience sample, but researchers
try to use their judgment to find particularly
knowledgeable individuals to include in the sample.
Compared with convenience sampling, judgment
sampling can increase the number of informed
opinions included in the sample.
This is often used when constructing a focus group,
where the intent is to gather information, and not to
perform analysis.
Likelihood of bias remains high because non-experts
are excluded.

Nonprobability: Referral
Sampling
Also known as Snowball Sampling.
Initial sample is drawn using some other sampling
method. Participants are then contacted by the
researcher to see if friends or acquaintances
would be interesting in participating.
This is a common sampling method when
studying hard-to-find populations, such as gang
members or wealthy elites.
This method intentionally overselects from a
target population, and therefore is not random nor
representative.

Nonprobability: Quota
Sampling
The sample is drawn by:
Soliciting participation of the first available, as
in convenience sampling;
Key demographic subgroups are identified;
When population proportion of each subgroup
are achieved, no new respondents from that
subgroup are included.

Nonprobability: Quota
Sampling
Compare to Stratified Sampling:
Both attempt to limit the number of
participants from each category in order
to be representative of the population
breakdown.
Quota sampling is not random in its
selection of participants.
Quota sampling might be used when
resources are too limited for a random
sample, but a representative sample is
desired.

The Sample Plan


Step 1: Define the Population. Who are the subjects of your study?
Review all of the prior literaturewhat is the standard population
chosen in past research into your subject?
What are the operational definitions of the key variables in your
analysis?
If you are not including all people in your population, why?
Step 2: Select the Data Collection Method
What impact does your data collection method have on sampling
frame, response rate, response count? What funds exist to execute
each method?
Online data collection methods have poor response rates and is free;
try to contact the full population.
Phone, mail, and face-to-face survey methods have higher response
rates, but are more expensive: a more carefully designed probability
sample can more easily be executed.

The Sample Plan


Step 3: Obtain a listing of the populationthis will
be the sampling frame.
Phone books, e-mail lists, mailing lists,
subscription lists.
Be aware of exactly which people are likely to be
left off of such a list, and ask whether this could
bias the sample.
Step 4: Choose a sampling method.
What resources are available? How much time is
there to administer the survey?
Probability samples are preferred.

Step 5: Determining Sample


Size
Incidence Rate: % of possible respondents
with the characteristics under study.
The lower the incidence rate, the more difficult
it will be to generate a large, unbiased sample.

Response Rate: The number of respondents


contacted who submitted a completed
survey (including partial completions).
The higher the response rate, the greater the
public confidence in the results; and the lower
the likelihood of selection bias.

Step 5: Determining Sample


Size
Rule-of-Thumb Approach: The belief that
the correct sample size should be set at
some fixed percent of the population,
usually 5%.
For example, if you were conducting a
survey of Brooklyn College students
(there are approximately 16,000), your
target sample size would be 800.
There is no reason to believe this 5%
number; it is just a standard practice.

Step 5: Determining Sample


Size
Conventional Approach: Evidence indicates
that the reduction in sample error levels off
after the sample size exceeds 1,000.
This has been an industry-wide finding.
Selecting a 5% sample size for some
populations would be an expensive
undertaking. If you were to take a 5%
sample of the US population, you would
need 15 million respondents.

Step 5: Determining Sample


Size

Step 5: Determining Sample


Size
Sample Size Formula Approach: We may
be able to find a useful sample size that is
less than 1,000 (saving our employer or
sponsor money).
Appropriate sample size is determined
largely by three factors:
1. the estimated prevalence of the variable of
interest
2. the desired level of confidence, and
3. the acceptable margin of error.

Formula for Sample Size


z 2 ( pq )
n
e2

Where:
Z is the z-score for the confidence interval
(typically 95% 1.96)
e is the maximum probability of error you are
willing to accept, (typically 5% 0.05)
p and q are known probabilities in the
population of some key characteristic. (p is
prob if true; q is 1-p)
Ex: we know that 65% of our population is
female.
(1.96^2)*(0.65)*(0.35) = 0.873964
(0.05^2) = .0025
0.873964 * .0025 = 349.59 (that is, 350).

Formula for Sample Size


Purpose of the formula: where
possible, wed like a smaller sample
(save money, time)
The larger p, the smaller the sample
size.
If p = 0.90, formula result will be 138.
If p = 0.50, formula result will be 384.

Logic: we can afford to use a smaller


sample when the population is
homogenous; but we must use a
larger sample when the population is

Draw The Sample


Administer the sample
Review the results for
representativeness, response rate,
and response count.
Sample Substitution:
adding units to a sample to increase sample
size.
This is typically done when sample size is
unusually small, or if sample units are flawed
(For example, damaged, incomplete, or
fraudulent surveys).

Review and Correct


The researcher validates the sample
by reviewing the characteristics of
the sample:
Is the sample sufficiently large?
Are key subsets of the population overor underrepresented?
How many surveys remain once those
deemed unacceptable are removed?

Ex Post Facto Solutions


Oversampling: sampling more than the number
of cases that are expected.
Researcher has an ideal response count in mind, and
estimates a usual response count.
The oversample may include population subsets
known to have low response rates.
If sample is not representative or response rate too
low, respondents from the oversample will be
selected randomly for inclusion.

The extra responses are set aside and included


only if the regular sample is found to be lacking.

Ex Post Facto Solutions


Weighting: assigning greater importance to
some cases relative to others.
Post-Administration Stratification:
Researcher weights cases after data has
been collected. Typically, even stratified
sampling cannot guarantee sample will be
truly representative.
We must know how key groups break out
in the population in order to use weighting
effectively.

Ex Post Facto Solutions


Weighted Mean: A mean adjusted to increase or
decrease impact of elements/groupings in the
sample. Common methods of weighting:
1. Dropping Outliers: drop the highest and lowest
scores, and calculate the mean from the remaining
scores.
Ex: Olympics; judged competitions: where a mean of all
ratings is calculated excluding the highest and lowest
scores, to avoid the impact of national bias.

2. SPSS Weighting: An SPSS weight is built into the data


file to automatically weight cases.
A weight variable is creates in SPSS where values reflect
population proportions.
These same weights are then applied to all statistical
procedures, including means.

Missing Elements and Nonresponse


Sample Frame Error: when sample survey results
fail to reflect true population parameter because
of errors in the sampling frame:
Inadequate Sampling Frame: sampling frame is not
intended to cover the entire population. Phone book.
Missing Elements: elements in population who are not
included in sampling frame. Incomplete list.
Clusters: listings in sampling frame refer to groups of
elements, not individuals. E.g. households.
Foreign Elements: elements that should not be
included. E.g., faculty appearing in a list of students.
Duplicate Listing: Same element comes up more than
once.

Missing Elements and Nonresponse


Nonresponse Error: when sample
survey results fail to reflect true
population parameter because
elements in sampling frame did not
respond:
Respondent disillusionment with surveys.
Inadequate or incomplete appeal to
participate.
Includes rudeness, brevity.

Insufficient time allowed to complete the


survey.

Solutions to Missing Elements and


Nonresponse
Oversampling: contacting
additional people beyond the core
sample to be filled in if sample is
too small.
Resampling: performing another
data collection (with a restricted
sample) to get more cases.
Counting only Valid Statistics:
summary statistics (e.g., valid
percentage) excluding missing

Solutions to Missing Elements and


Nonresponse
Limited Information Maximum
Likelihood (LIML) Estimation:
estimating probable responses to
missing items, based on responses to
other items.
Weighting: adjusting summary
statistics (frequencies, probabilities,
means) to reflect population
breakdown for a given category.

Selection Bias
Error resulting from how we select elements or groups
for inclusion in the sample. Examples:
Sample frame error: inclusion in the sampling frame is
nonrandom.
Data collection error: the procedure for collecting data
favors one important group over another.
Self-selection: those who want to participate may be
different in some key way from those who do not.
Indication bias: when inclusion depends on indication, e.g. a
treatment is given to people in high risk of acquiring a
disease, potentially causing a preponderance of treated
people among those acquiring the disease.
Selecting end-points of a series. For example, to maximize
a claimed trend, you could start the time series at an
unusually low year, and end on a high one.

You might also like