UCC 301 Topic 6
UCC 301 Topic 6
Page 2 of 7
(2) Systematic/interval sampling
In Systematic Sampling individuals are chosen at regular intervals (for example every
fifth) from the sampling frame. Ideally we randomly select a number to tell us where
to start selecting individuals from the list. For example, a systematic sample of 100
students is to be selected from 1200 students of a school. The sampling fraction is:
The sampling interval is therefore 12. The researcher should number pieces of paper
from one to twelve. The number of the first student to be included in the sample is
chosen randomly, for example by blindly picking one out of twelve pieces of paper,
numbered 1 to 12. If number 6 is picked, then every twelfth student will be included
in the sample, starting with student number 6, until 100 students are selected: the
numbers selected would be 6, 18, 30, 42, 54, 66, 78, etc.
Systematic sampling is usually less time consuming and easier to perform than simple
random sampling. However, there is a risk of bias, as the sampling interval may
coincide with a systematic variation in the sampling frame. For instance, if we want
to select a random sample of days on which to count adult class attendance, systematic
sampling with a sampling interval of 7 days would be inappropriate, as all study days
would fall on the same day of the week (e.g., Tuesdays only, which might be a market
day and many adults could be in the market rather than in adult literacy classes).
(3) Stratified sampling
The simple random sampling method described above has as disadvantage that small
groups in which the researcher is interested may hardly appear in the sample. If it is
important that the sample includes representative study units of small groups with
specific characteristics (for example, residents from urban and rural areas, or different
religious or ethnic groups), then the sampling frame must be divided into groups, or
strata, according to these characteristics. Random or systematic samples of a pre-
determined size will then have to be obtained from each group (stratum). This is called
Stratified Sampling.
Stratified sampling is only possible when we know what proportion of the study
population belongs to each group we are interested in. An advantage of stratified
sampling is that we can take a relatively large sample from a small group in our study
population. This allows us to get a sample that is big enough to enable us to draw
valid conclusions about a relatively small group without having to collect an
unnecessarily large (and hence expensive) sample of the other, larger groups.
For example, if we were to do a study on Karatina University students’ awareness of
HIV/AIDS, we will divide the students into different strata such as age, gender, area
of origin, course being taken, year of study, etc. then, in line with the number of
students in each strata, we draw proportionate sample from each strata
(4) Cluster sampling
It may be difficult or impossible to take a simple random sample of the units of the
study population at random, because a complete sampling frame does not exist.
Logistical difficulties may also discourage random sampling techniques (e.g.,
Page 3 of 7
interviewing people who are scattered over a large area may be too time-consuming
for example nomadic pastoralists). However, when a list of groupings of study units
is available (e.g., villages or schools) or can be easily compiled, a number of these
groupings can be randomly selected. The selection of groups of study units (clusters)
instead of the selection of study units individually is called Cluster Sampling.
Clusters are often geographic units (e.g., districts, villages) or organizational units
(e.g., clinics, training groups). For example, in a study of the knowledge, attitudes and
practices (KAP) related to family planning in Municipality Location, Mathira East
Sub-County, a list is made of all the villages in the municipality. Using this list, a
random sample of villages is chosen and all study units in the selected villages are
interviewed.
(5) Multi-stage sampling
In very large and diverse populations sampling may be done in two or more stages.
This is often the case in community-based studies, in which people are to be
interviewed from different villages, and the villages have to be chosen from different
areas. For example, in a study of utilization of pit latrines in Nyeri County, 150
homesteads are to be visited for interviews with family members as well as for
observations on types and cleanliness of latrines. The district is composed of 6 wards
and each ward has between 6 and 9 villages. The following four-stage sampling
procedure could be performed:
Select 3 wards out of the 6 by simple random sampling.
For each ward select 5 villages by simple random sampling (15 villages in total).
For each village select 10 households. Since simply choosing households in the
centre of the village would produce a biased sample, the following sampling
procedure is proposed:
Go to the centre of the village.
Choose a direction in a random way: spin a bottle on the ground and
choose the direction the bottleneck indicates.
Walk in the chosen direction and select every household until you have
the 10 you need.
If you reach the boundary of the village and you still do not have 10
households, return to the centre of the village, walk in the opposite
direction and continue to select your sample in the same way until you
have 10.
If there is nobody in a chosen household, take the next nearest one.
Decide beforehand whom to interview (for example the head of the household,
if present, or the oldest adult who lives there and who is available).
A multi-stage sampling procedure is carried out in phases and it usually involves
more than one sampling method. The main advantages of cluster and multi-stage
sampling are that:
1. A sampling frame of individual units is not required for the whole population.
Existing sampling frames of clusters are sufficient. Only within the clusters that
are finally selected is there a need to list and sample the individual units (if not
using the bottle spinning method).
Page 4 of 7
2. The sample is easier to select than a simple random sample of similar size,
because the individual units in the sample are physically together in groups,
instead of scattered all over the study population.
The main disadvantage of this type of sampling is that, compared to simple random
sampling, there is a larger probability that the final sample will not be representative
of the total study population. The likelihood of the sample not being representative
depends mainly on the number of clusters that is selected in the first stage. The larger
the number of clusters, the greater is the likelihood that the sample will be
representative. Further, the sampling units at community level should be selected
randomly (avoid convenience sampling!).
(b). Non-probability/non-random sampling.
This refers to the case where the probability of including each element of the
population in the sample is unknown. This is the sampling used when the research is
not interested in selecting a sample that is representative of the population. It is mostly
used in qualitative studies because the focus is on in-depth information and not
making inferences or generalizations. Some of the non-probability sampling
procedures include:
(1). Purposive/judgmental sampling
This technique allows a researcher to use cases that have the required information
with respect to the objectives of the study. Cases are therefore handpicked because
they are informative or possess the required characteristics. A researcher who uses
this type of sampling must specify the criteria for choosing particular cases e.g. a
certain age range, gender, religious affiliation or educational level. Some researchers
may use purposive sampling as a multi-stage sampling procedure. For example, it can
be used to pick the location where the study subjects are to be found such as sampling
a district of interest. Purposive sample size, which may or may not be fixed before
data collection depends on the resources and the time available to the researcher and
the study objectives. In purposive sampling, sample size is often determined on the
basis of theoretical saturation (the point where more data brings no more additional
knowledge concerning the study question).
(2). Snowball sampling
Also called mud ball sampling or chain referral sampling is the technique where initial
subjects with the desired characteristics are identified using purposeful sampling.
Those who have been identified name others who they know have the required
characteristics until the researcher gets the number of cases/respondents he requires
for his/her study. As such, participants or informants with whom contact has already
been made use their networks to refer the researcher to other people who could
potentially participate in or contribute to the study. This method is normally used
when the researcher does not have adequate information concerning the population
that has the characteristics he requires. For example, if a researcher wants to study
people who have been retrenched in an organization, he needs to identify one of the
retrenches who then identifies another and the trend continues until the researcher
gets the required sample size.
Page 5 of 7
(3). Quota sampling
Similar to stratified random sampling and the objective is to include various groups
or quotas of the study population based on some criteria or certain characteristics.
Such characteristics could include age, place of residence, gender class, profession,
marital status, HIV status, etc. The researcher divides the study population based on
certain criteria and then purposively picks respondents from each category because
they belong to that category. The criteria the researcher chooses allows him/her to
focus on the people he/she thinks are most likely to experience, know about or have
insights into the research topic. The selection of actual participants is not random since
study subjects are picked as they fit into identified quotas. The overall sample hence
will somehow be accidentally selected.
(4). Convenient/accidental/availability sampling
Also called volunteer sampling, it involves selecting cases or units of
observation/study as they become available to the researcher. This is mainly done by
journalists especially when they want to get the general feeling of the public about an
issue. They can go to the street and ask pedestrians to give their opinion about a public
policy issues such as the President’s Madaraka Day Speech or the Finance Minister’s
Budget Speech. The researcher hence selects a respondent who is conveniently
available such as a neighbour, colleague or friend. The main feature of this sampling
is that respondents are readily available and accessible. Researchers who use this kind
of sampling cannot have any basis to argue that it is representative of the population.
6.5. Bias in sampling
Bias in sampling is a systematic error in sampling procedures, which leads to a
distortion in the results of the study. This has the potential of misrepresenting facts
and therefore distorting reality. The use of faulty data collection tools would lead to
biased results. Bias can also be introduced as a consequence of improper sampling
procedures, which result in the sample not being representative of the study
population. For example, a study was conducted to determine the health needs of a
rural population in order to plan primary health care activities. However, a nomadic
tribe, which represented one third of the total population, was left out of the study.
As a result the study did not give an accurate picture of the health needs of the total
population.
There are several possible sources of bias that may arise when sampling and the most
well-known source is non-response. Non-response can occur in any interview
situation, but it is mostly encountered in large-scale surveys with self-administered
questionnaires. Respondents may refuse or forget to fill in the questionnaire. The
problem lies in the fact that non-respondents in a sample may exhibit characteristics
that differ systematically from the characteristics of respondents. There are several
ways to deal with this problem and reduce the possibility of bias:
1. Data collection tools should be pre-tested and if necessary, adjustments made
to ensure better co-operation.
2. If non-response is due to absence of the subjects, follow-up of non-respondents
may be considered.
Page 6 of 7
3. If non-response is due to refusal to co-operate, an extra, separate study of non-
respondents may be considered in order to identify to what extent they differ
from respondents.
4. Another strategy is to include additional people in the sample, so that non-
respondents who were absent during data collection can be replaced. However,
this can only be justified if their absence was very unlikely to be related to the
topic being studied.
NB: The bigger the non-response rate, the more necessary it becomes to take remedial
action. It is important in any study to mention the non-response rate and to honestly
discuss whether and how the non-response might have influenced the results. Other
sources of bias in sampling may be less obvious, but at least as serious and include the
following:
1. Studying volunteers only. The fact that volunteers are motivated to participate
in the study may mean that they are also different from the study population
on the factors being studied. Therefore it is better to avoid using non-random
selection procedures that introduce such an element of choice.
2. Sampling of the people you meet in their homes or farms only (convenience).
These are likely to differ systematically from people who are elsewhere for
example those doing white collar jobs and therefore leading to
unrepresentative sample and data.
3. Missing cases of short duration. In studies of the prevalence of disease, cases
of short duration are more likely to be missed. This may mean missing fatal
cases, cases with short illness episodes and mild cases.
4. Seasonal bias. It may be that the problem under study, for example,
malnutrition or prostitution exhibits different characteristics in different
seasons of the year. For this reason, data should be collected on the prevalence
and distribution of the phenomenon understudy in an area during all seasons
rather than just at one point in time. When investigating health services’
performance, one has to consider the fact that towards the end of the financial
year shortages may occur in certain budget items which may affect the quality
of services delivered.
5. Tarmac bias. Study areas are often selected because they are easily accessible
by car. However, these areas are likely to be systematically different from more
inaccessible areas and the information collected is more likely to be
unrepresentative of the population from which it is drawn.
Page 7 of 7