Sampling design
Sampling design
METHODS
Week 8
1
SAMPLING
A sample is “a smaller (but hopefully
representative) collection of units from a population
used to determine truths about that population”
(Field, 2005)
Why sample?
Resources (time, money) and workload
Gives results with known accuracy that can be
calculated mathematically
3
SAMPLING…….
4
SAMPLING…….
STUDY POPULATION
SAMPLE
TARGET POPULATION
6
Types of Samples
7
Process
The sampling process comprises several
stages:
1.Defining the population of concern
2.Specifying a sampling frame, a set of
items or events possible to measure
3.Specifying a sampling method for
selecting items or events from the frame
4.Determining the sample size
5.Implementing the sampling plan
6.Sampling and data collecting
7.Reviewing the sampling process
8
Population definition
A population can be defined as including
all people or items with the characteristic
one wishes to understand.
Because there is very rarely enough time
or money to gather information from
everyone or everything in a population,
the goal becomes finding a
representative sample (or subset) of that
population.
9
Population definition…….
Note also that the population from which the
sample is drawn may not be the same as the
population about which we actually want
information. Often there is large but not
complete overlap between these two groups due
to frame issues etc .
Sometimes they may be entirely separate - for
instance, we might study rats in order to get a
better understanding of human health, or we
might study records from people born in 2008 in
order to make predictions about people born in
2009.
10
SAMPLING FRAME
In the most straightforward
case, it is possible to identify
and measure every single item
in the population and to include
any one of them in our sample,
known as sampling frame.
The sampling frame must be
representative of the population
11
PROBABILITY SAMPLING
12
PROBABILITY SAMPLING…….
13
NON PROBABILITY SAMPLING
Any sampling method where some elements of
population have no chance of selection
(these are sometimes referred to as 'out of
coverage'/'undercovered'), or where the
probability of selection can't be accurately
determined.
It involves the selection of elements based
on assumptions regarding the population of
interest, which forms the criteria for
selection.
Because the selection of elements is
nonrandom, non-probability sampling not
allows the estimation of sampling errors..
14
NON PROBABILITY SAMPLING….
Example: We visit every household in a given
street, and interview the first person to answer
the door.
In any household with more than one occupant,
this is a nonprobability sample, because some
people are more likely to answer the door (e.g. an
unemployed person who spends most of their
time at home is more likely to answer than an
employed housemate who might be at work when
the interviewer calls) and it's not practical to
calculate these probabilities.
15
NONPROBABILITY SAMPLING…….
• Nonprobability Sampling includes:
Accidental Sampling, Quota Sampling and
Purposive Sampling.
• In addition, nonresponse effects may turn
any probability design into a nonprobability
design
• if the characteristics of nonresponse are
not well understood, since nonresponse
effectively modifies each element's
probability of being sampled.
16
SIMPLE RANDOM SAMPLING
• Applicable when population is small,
homogeneous & readily available
• All subsets of the frame are given an equal
probability. Each element of the frame thus has
an equal probability of selection.
• This is done by assigning a number to each unit
in the sampling frame.
• A table of random number or lottery system is
used to determine which units are to be
selected.
17
SIMPLE RANDOM SAMPLING……..
Estimates are easy to calculate.
Simple random sampling is always an EPS design, but not
all EPS designs are simple random sampling.
Disadvantages
If sampling frame large, this method impracticable.
Minority subgroups of interest in population may not be
present in sample in sufficient numbers for study.
18
SYSTEMATIC SAMPLING
Systematic sampling
relies on arranging the target population according to
some ordering scheme and
then selecting elements at regular intervals through that
ordered list.
Systematic sampling involves a random start and then proceeds
with the selection of every kth element from then onwards. In
this case, k=(population size/sample size).
It is important that the starting point is not automatically the first
in the list, but is instead randomly chosen from within the first
to the kth element in the list.
A simple example would be to select every 10th name from the
telephone directory (an 'every 10th' sample, also referred to as
'sampling with a skip of 10').
19
SYSTEMATIC SAMPLING……
As described above, systematic sampling is an EPS method,
because all elements have the same probability of selection (in
the example given, one in ten).
It is not 'simple random sampling' because different subsets of
the same size have different selection probabilities - e.g. the set
{4,14,24,...,994} has a one-in-ten probability of selection, but
the set {4,13,24,34,...} has zero probability of selection.
20
SYSTEMATIC SAMPLING……
ADVANTAGES:
Sample easy to select
Suitable sampling frame can be identified easily
Sample evenly spread over entire reference population
DISADVANTAGES:
Sample may be biased
Difficult to assess precision of estimate from one
survey.
21
STRATIFIED SAMPLING
Where population holds number of distinct
categories, the frame can be organized into
separate "strata." Each stratum is then sampled
as an independent sub-population, out of which
individual elements can be randomly selected.
Every unit in a stratum has same chance of being
selected.
Using same sampling fraction for all strata ensures
proportionate representation in the sample.
Adequate representation of minority subgroups of
interest can be ensured by stratification & varying
sampling fraction between strata as required.
22
STRATIFIED SAMPLING……
Finally, since each stratum is treated as an
independent population, different sampling
approaches can be applied to different strata.
24
POSTSTRATIFICATION
Stratification is sometimes introduced after the
sampling phase in a process called
"poststratification“.
This approach is typically implemented due to a lack
of prior knowledge of an appropriate stratifying
variable or when the experimenter lacks the
necessary information to create a stratifying variable
during the sampling phase. Although the method is
susceptible to the pitfalls of post hoc approaches, it
can provide several benefits in the right situation.
Implementation usually follows a simple random
sample. In addition to allowing for stratification on
an ancillary variable, poststratification can be used
to implement weighting, which can improve the
25
OVERSAMPLING
Choice-based sampling is one of the
stratified sampling strategies. In this, data
are stratified on the target and a sample is
taken from each strata so that the rare
target class will be more represented in the
sample. The model is then built on this
biased sample. The effects of the input
variables on the target are often estimated
with more precision with the choice-based
sample even when a smaller overall sample
size is taken, compared to a random
sample. The results usually must be
adjusted to correct for the oversampling.
26
CLUSTER SAMPLING
Cluster sampling is an example of 'two-stage
sampling' .
First stage a sample of areas is chosen;
Second stage a sample of respondents within
those areas is selected.
Population divided into clusters of
homogeneous units, usually based on
geographical locations.
Sampling units are groups rather than
individuals.
A sample of such clusters is then selected.
All units from the selected clusters are studied.
27
CLUSTER SAMPLING…….
Advantages :
Cuts down on the cost of preparing a
sampling frame.
This can reduce travel and other
administrative costs.
Disadvantages: sampling error is higher for
a simple random sample of same size.
Often used to evaluate vaccination
coverage in EPI
28
CLUSTER SAMPLING…….
• Identification of clusters
– List all cities, towns, villages & wards of cities with
their population falling in target area under study.
– Calculate cumulative population & divide by 30, this
gives sampling interval.
– Select a random no. less than or equal to sampling
interval having same no. of digits. This forms 1st
cluster.
– Random no.+ sampling interval = population of 2 nd
cluster.
– Second cluster + sampling interval = 4th cluster.
– Last or 30th cluster = 29th cluster + sampling
interval
29
CLUSTER SAMPLING…….
Two types of cluster sampling methods.
One-stage sampling. All of the elements
within selected clusters are included in the
sample.
Two-stage sampling. A subset of elements
within selected clusters are randomly
selected for inclusion in the sample.
30
CLUSTER SAMPLING…….
• Freq cf cluster • XVI 3500 52500 17
• I 2000 2000 1 • XVII 4000 56500 18,19
• II 3000 5000 2 • XVIII 4500 61000 20
• III 1500 6500 • XIX 4000 65000 21,22
• IV 4000 10500 3 • XX 4000 69000 23
• V 5000 15500 4, 5
• VI 2500 18000 6
• XXI 2000 71000 24
• VII 2000 20000 7 • XXII 2000 73000
• VIII 3000 23000 8 • XXIII 3000 76000 25
• IX 3500 26500 9 • XXIV 3000 79000 26
• X 4500 31000 10 • XXV 5000 84000 27,28
• XI 4000 35000 11, 12 • XXVI 2000 86000 29
• XII 4000 39000 13 • XXVII 1000 87000
• XIII 3500 44000 14,15 • XXVIII 1000 88000
• XIV 2000 46000 • XXIX 1000 89000 30
• XV 3000 49000 16 • XXX 1000 90000
• 90000/30 = 3000 sampling
interval
31
Difference Between Strata and
Clusters
Although strata and clusters are both non-
overlapping subsets of the population, they
differ in several ways.
All strata are represented in the sample; but
only a subset of clusters are in the sample.
With stratified sampling, the best survey
results occur when elements within strata are
internally homogeneous. However, with
cluster sampling, the best results occur when
elements within clusters are internally
heterogeneous
32
MULTISTAGE SAMPLING
33
MULTISTAGE SAMPLING……..
This technique, is essentially the process of taking
random samples of preceding random samples.
Not as effective as true random sampling, but
probably solves more of the problems inherent to
random sampling.
An effective strategy because it banks on multiple
randomizations. As such, extremely useful.
Multistage sampling used frequently when a complete
list of all members of the population not exists and is
inappropriate.
Moreover, by avoiding the use of all sample units in
all selected clusters, multistage sampling avoids the
large, and perhaps unnecessary, costs associated
with traditional cluster sampling.
34
MULTI PHASE SAMPLING
Part of the information collected from whole sample & part
from subsample.
35
MATCHED RANDOM SAMPLING
A method of assigning participants to groups in
which pairs of participants are first matched on some
characteristic and then individually assigned
randomly to groups.
The Procedure for Matched random sampling can be
briefed with the following contexts,
Two samples in which the members are clearly
paired, or are matched explicitly by the researcher.
For example, IQ measurements or pairs of identical
twins.
Those samples in which the same attribute, or
variable, is measured twice on each subject, under
different circumstances. Commonly called repeated
measures.
Examples include the times of a group of athletes for
1500m before and after a week of special training;
the milk yields of cows before and after being fed a
36 particular diet.
QUOTA SAMPLING
The population is first segmented into mutually
exclusive sub-groups, just as in stratified sampling.
Then judgment used to select subjects or units from
each segment based on a specified proportion.
For example, an interviewer may be told to sample 200
females and 300 males between the age of 45 and 60.
It is this second step which makes the technique one of
non-probability sampling.
In quota sampling the selection of the sample is non-
random.
For example interviewers might be tempted to
interview those who look most helpful. The problem is
that these samples may be biased because not
everyone gets a chance of selection. This random
element is its greatest weakness and quota versus
probability has been a matter of controversy for many
37 years
CONVENIENCE SAMPLING
Sometimes known as grab or opportunity sampling or
accidental sampling.
A type of nonprobability sampling which involves the sample
being drawn from that part of the population which is close to
hand. That is, readily available and convenient.
The researcher using such a sample cannot scientifically
make generalizations about the total population from this
sample because it would not be representative enough.
For example, if the interviewer was to conduct a survey at a
shopping center early in the morning on a given day, the
people that he/she could interview would be limited to those
given there at that given time, which would not represent the
views of other members of society in such an area, if the
survey was to be conducted at different times of day and
several times per week.
This type of sampling is most useful for pilot testing.
In social science research, snowball sampling is a similar
technique, where existing study subjects are used to recruit
more subjects into the sample.
38
CONVENIENCE SAMPLING…….
39
39
Judgmental sampling or
Purposive sampling
- The researcher chooses the sample based
on who they think would be appropriate for
the study. This is used primarily when there
is a limited number of people that have
expertise in the area being researched
40
PANEL SAMPLING
Method of first selecting a group of participants through a
random sampling method and then asking that group for the
same information again several times over a period of time.
Therefore, each participant is given same survey or interview
at two or more time points; each period of data collection
called a "wave".
This sampling methodology often chosen for large scale or
nation-wide studies in order to gauge changes in the
population with regard to any number of variables from
chronic illness to job stress to weekly food expenditures.
Panel sampling can also be used to inform researchers about
within-person health changes due to age or help explain
changes in continuous dependent variables such as spousal
interaction.
There have been several proposed methods of analyzing
panel sample data, including growth curves.
41
Questions???
42
What sampling method u recommend?
Determining proportion of undernourished five
year olds in a village.
Investigating nutritional status of preschool
children.
In estimation of immunization coverage in a
province, data on seven children aged 12-23
months in 30 clusters are used to determine
proportion of fully immunized children in the
province.
Give reasons why cluster sampling is used in this
survey.
43
Probability proportional to size
sampling
In some cases the sample designer has access to an
"auxiliary variable" or "size measure", believed to be
correlated to the variable of interest, for each
element in the population. This data can be used to
improve accuracy in sample design. One option is to
use the auxiliary variable as a basis for stratification,
as discussed above.
Another option is probability-proportional-to-size
('PPS') sampling, in which the selection probability
for each element is set to be proportional to its size
measure, up to a maximum of 1. In a simple PPS
design, these selection probabilities can then be
used as the basis for Poisson sampling. However,
this has the drawbacks of variable sample size, and
different portions of the population may still be over-
or under-represented due to chance variation in
selections. To address this problem, PPS may be
combined with a systematic approach.
44
Contd.
Example: Suppose we have six schools with populations of 150,
180, 200, 220, 260, and 490 students respectively (total 1500
students), and we want to use student population as the basis
for a PPS sample of size three. To do this, we could allocate the
first school numbers 1 to 150, the second school 151 to
330 (= 150 + 180), the third school 331 to 530, and so on to
the last school (1011 to 1500). We then generate a random
start between 1 and 500 (equal to 1500/3) and count through
the school populations by multiples of 500. If our random start
was 137, we would select the schools which have been
allocated numbers 137, 637, and 1137, i.e. the first, fourth, and
sixth schools.
The PPS approach can improve accuracy for a given sample
size by concentrating sample on large elements that have the
greatest impact on population estimates. PPS sampling is
commonly used for surveys of businesses, where element size
varies greatly and auxiliary information is often available - for
instance, a survey attempting to measure the number of guest-
nights spent in hotels might use each hotel's number of rooms
as an auxiliary variable. In some cases, an older measurement
of the variable of interest can be used as an auxiliary variable
when attempting to produce more current estimates.
45
Event sampling
Event Sampling Methodology (ESM) is a new form of
sampling method that allows researchers to study ongoing
experiences and events that vary across and within days
in its naturally-occurring environment. Because of the
frequent sampling of events inherent in ESM, it enables
researchers to measure the typology of activity and detect
the temporal and dynamic fluctuations of work
experiences. Popularity of ESM as a new form of research
design increased over the recent years because it
addresses the shortcomings of cross-sectional research,
where once unable to, researchers can now detect intra-
individual variances across time. In ESM, participants are
asked to record their experiences and perceptions in a
paper or electronic diary.
There are three types of ESM:# Signal contingent –
random beeping notifies participants to record data. The
advantage of this type of ESM is minimization of recall
bias.
Event contingent – records data when certain events occur
46
Contd.
Event contingent – records data when certain events occur
Interval contingent – records data according to the passing
of a certain period of time
ESM has several disadvantages. One of the disadvantages
of ESM is it can sometimes be perceived as invasive and
intrusive by participants. ESM also leads to possible self-
selection bias. It may be that only certain types of
individuals are willing to participate in this type of study
creating a non-random sample. Another concern is related
to participant cooperation. Participants may not be
actually fill out their diaries at the specified times.
Furthermore, ESM may substantively change the
phenomenon being studied. Reactivity or priming effects
may occur, such that repeated measurement may cause
changes in the participants' experiences. This method of
sampling data is also highly vulnerable to common
method variance.[6]
47
contd.
Further, it is important to think about
whether or not an appropriate dependent
variable is being used in an ESM design.
For example, it might be logical to use ESM
in order to answer research questions
which involve dependent variables with a
great deal of variation throughout the day.
Thus, variables such as change in mood,
change in stress level, or the immediate
impact of particular events may be best
studied using ESM methodology. However,
it is not likely that utilizing ESM will yield
meaningful predictions when measuring
someone performing a repetitive task
throughout the day or when dependent
variables are long-term in nature (coronary
48 heart problems).
49