Unit 6 - Sampling (Notes)
Unit 6 - Sampling (Notes)
6.1 Introduction
Whatever your research question(s) and objectives you will need to consider whether you
need to use sampling. Occasionally, it may be possible to collect and analyse data from every
possible case or group member; this is termed a census.
However, for many research questions and objectives it will be impossible for you either to
collect or to analyse all the potential data available to you owing to restrictions of time,
money and often access.
Sampling techniques enable you to reduce the amount of data you need to collect by
considering only data from a subgroup rather than all possible cases or elements.
Some research questions will require sample data to generalise statistically about all the cases
from which your sample has been selected. For example, if you asked a sample of consumers
what they thought of a new chocolate bar and 75% said that they thought it was too
expensive, you might infer that 75% of all consumers felt that way. Other research questions
may not involve such generalisations.
A population refers to all elements of a clearly defined body of people, events or objects. A
population may be defined in terms of four attributes, namely content, units, extent and time.
Thus, in defining youth as a population nationally, we might consider all persons 15 to 24
years old (content), in households (units) in the country of Belize (extent) in 2004 (time).
The above approach to population definition will yield a finite population, that is one in
which it is possible to count all the elements or units. By contrast, in an infinite population,
there is no limit to the number of elements it may contain. For example, all Belizean women
in the country’s labour force over an unspecified period of time is an infinite population.
Page 1 of 8
It is far more preferable that researchers work with finite as opposed to infinite populations.
A distinction needs to be made between non-coverage and non-response. The former (non-
coverage) refers to a situation where potential sample elements are found to be missing from
the frame population. The latter (non-response) arises when sample elements that have been
selected for inclusion in the study do not yield data as expected.
A sample is a subset selected from the population, and sampling is the process by which this
subset is chosen.
Page 2 of 8
research, a significant piece of the puzzle that you are trying to understand may be
missing.
o The population shares an uncommon characteristic(s). The characteristic shared by
the population is considered to be uncommon because this tends to explain why the
population that can be studied is very small. For example, if you were performing
case study research in a single firm of 400 employees, examining the effect of senior
manager mentorship on employee motivation, there may only be 5-10 senior
managers. In this example, the uncommon characteristic is the fact that the people
(i.e., units) of interest are all senior managers. Since the total number of senior
managers is very small, it makes sense to include all of them in your research.
o Finally, the counting and data gathering exercise that is carried out in most countries
every ten years is, by law, based on the enumeration of the entire population. This
exercise is known as a national census.
If a sample is randomly drawn from any given population, that sample can be used for the
purpose of drawing conclusions or making inferences about the population of interest.
Random refers to each element or unit of that population having an equal, non-zero chance of
being chosen in the sample.
In terms of sample selection, the most effective means of ensuring this close resemblance is
to select on the basis of chance – that is, random or probability sampling.
Finally, regardless of how close the resemblance, there is always some element of mismatch,
a discrepancy, since the sample is not an exact replica of the population. This discrepancy is
regarded as the sampling error.
Page 3 of 8
To generate sample size, visit: http://www.raosoft.com/samplesize.html
o Margin of error:- the accuracy you require for any estimates made from your sample;
amount of error that you can tolerate.
o Confidence level:- the level of certainty that the characteristics of the data collected
will represent the characteristics of the total population; amount of uncertainty you
can tolerate. Researchers normally work to a 95 per cent level of certainty. This
means that if your sample was selected 100 times, at least 95 of these samples would
be certain to represent the characteristics of the population.
Page 4 of 8
o This technique is applicable when it is necessary to acknowledge subgroups or strata
on the population based on one or more variables in the research question. For the
purposes of the study, it is important that each subgroup be accurately represented.
o Researchers generally use this technique when there are recognizable subgroups in
the population whose perspectives and/or attitudes ought to be reflected in the sample.
o You begin by dividing the population into strata or subgroups, then you select a
sample from each strata/subgroup using either SRS or Systematic Sampling. In
dividing the population into subgroups, seek homogeneity within the subgroup and
heterogeneity between subgroups.
o Stratified sampling techniques are generally used when the population is
heterogeneous, or dissimilar, where certain homogeneous, or similar, sub-populations
can be isolated (strata).
o As an example, consider a study that is designed to examine the eating habits of
diabetics who attend clinic at the general hospital. The researchers find that 36 of the
600 persons registered (6% of the population) are Rastafarians. For this subgroup to
be adequately represented in the sample drawn, they should constitute 6% of the
sample. Thus, in a sample of 180 (30% of the population), there should be about 10
Rastafarians.
Number of elements in the sample from subgroup = (Number of elements in
subgroup / population) x Sample size
o Based on the above example, if either SRS or systematic sampling was used, there
would be a risk of selecting no Rasta, selecting too few Rastas, or selecting too many
Rastas. Thus in the above example, the stratified sampling technique is best.
o Stratified sampling is therefore considered more superior to either SRS or systematic
sampling since it ensures the proper representation of the stratification variables in the
sample. Stratified sampling is likely to be more representative on several variables
than a SRS.
o In social research, race/ethnicity, income level, gender, age, and level of educational
attainment are some of the main variables used in stratifying a population.
Page 5 of 8
o Multi-stage cluster sampling then, involves repeating two basic steps: (1) listing and
(2) sampling.
A list of primary sampling units is compiled and, perhaps, stratified for
sampling. A sample of these units is then selected.
The selected primary sampling units are then listed and perhaps stratified. The
list of secondary sampling units is then sampled, and so forth.
o In cluster sampling, instead of selecting all the subjects from the entire population
right off, the researcher takes several steps in gathering his sample population.
o The most common cluster used in research is a geographical cluster.
o For example, a researcher wants to survey academic performance of high school
students in Belize.
He can divide the entire population (population of Belize) into different
clusters (districts).
Then the researcher selects a number of clusters depending on his research
through simple or systematic random sampling.
Then, from the selected clusters (randomly selected districts) the researcher
can either include all the high school students as subjects or he can select a
number of subjects from each cluster through simple or systematic random
sampling.
The important thing to remember about this sampling technique is to give all
the clusters equal chances of being selected.
o Types of cluster:
One-stage cluster:- Recall the example given above; one-stage cluster sample
occurs when the researcher includes all the high school students from all the
randomly selected clusters as sample.
Two-stage cluster:- From the same example above, two-stage cluster sample is
obtained when the researcher only selects a number of students from each
cluster by using simple or systematic random sampling.
o Advantages:
This sampling technique is cheap, quick and easy. Instead of sampling an
entire country when using simple random sampling, the researcher can
allocate his limited resources to the few randomly selected clusters or areas
when using cluster samples.
Related to the first advantage, the researcher can also increase his sample size
with this technique. Considering that the researcher will only have to take the
sample from a number of areas or clusters, he can then select more subjects
since they are more accessible.
o Disadvantages:
From all the different type of probability sampling, this technique is the least
representative of the population. The tendency of individuals within a cluster
is to have similar characteristics and with a cluster sample, there is a chance
that the researcher can have an overrepresented or underrepresented cluster
which can skew the results of the study.
This is also a probability sampling technique with a possibility of high
sampling error. This is brought by the limited clusters included in the sample
leaving off a significant proportion of the population unsampled.
Page 6 of 8
Difference between stratified random sampling and cluster sampling:
o The main difference between cluster sampling and stratified sampling lies with the
inclusion of the cluster or strata.
o In stratified random sampling, all the strata of the population is sampled while in
cluster sampling, the researcher only randomly selects a number of clusters from the
collection of clusters of the entire population. Therefore, only a number of clusters are
sampled, all the other clusters are left unrepresented.
The techniques for selecting samples discussed earlier have all been based on the assumption
that your sample will be chosen at random from a sampling frame. However, within business
research, such as market surveys and case study research, this may either not be possible (as
you do not have a sampling frame) or not be appropriate to answering your research question.
This means your sample must be selected some other way. Non-probability sampling (or non-
random sampling) provides a range of alternative techniques to select samples, the majority
of which include an element of subjective judgement.
These techniques are those in which the probability of selecting an element to be part of the
sample is not known. These techniques may be appropriate for exploratory, observational or
even qualitative research, although the limitations of this type of sampling should be
recognized.
For all non-probability sampling techniques, other than for quota samples (which we discuss
later), the issue of sample size is ambiguous and, unlike probability sampling, there are no
rules. Your sample size is dependent on your research question(s) and objectives.
Quota sampling. This is used intensively in commercial research. The aim is to produce a
sample that reflects a population in terms of the relative proportions of people in different
categories, such as gender, ethnicity, socio-economic groups etc. Example: If 5 males
surveyed, then there will be 5 females be surveyed too. Decisions on sample size are
governed by the need to have sufficient responses in each quota to enable subsequent
statistical analyses to be undertaken. This often necessitates a sample size of between 2000
and 5000.
Snowball sampling. This procedure is appropriate when the members of a special population
are difficult to locate. Some examples might be homeless individuals, migrant workers or
undocumented immigrants.
o The researcher collects data on the few members of the target population that he can
locate, then ask those individuals to provide information needed to locate other
members of that population whom they happen to know.
Page 7 of 8
o “Snowball” refers to the process of accumulation as each located respondent, and
then suggests other respondents. Because this procedure may result in samples with
questionable representation, it is used primarily for exploratory purposes.
Information gained from any of the above non-probability sampling techniques should be
treated with extreme caution. Such information should not be extrapolated to generalizations
about entire populations since samples were not selected on a systematic basis.
Other disadvantages:
o No controls for personal bias
o The ability to capture variation that exists in the wider population is not possible.
o These techniques do not allow for calculating sampling error and estimating precision.
Page 8 of 8