0% found this document useful (0 votes)
4 views

Unit IV

The document discusses the concepts of universe, sample, and sampling in research, emphasizing the importance of selecting a representative sample to draw valid conclusions. It outlines the qualities of a good sampling frame, the types of sampling errors, and methods to reduce both sampling and non-sampling errors. Additionally, it explains probability and non-probability sampling techniques, highlighting their advantages and disadvantages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit IV

The document discusses the concepts of universe, sample, and sampling in research, emphasizing the importance of selecting a representative sample to draw valid conclusions. It outlines the qualities of a good sampling frame, the types of sampling errors, and methods to reduce both sampling and non-sampling errors. Additionally, it explains probability and non-probability sampling techniques, highlighting their advantages and disadvantages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

UNIT IV: SAMPLING – DEFINING THE UNIVERSE

Universe or Population: The universe consists of all survey elements that


qualify for inclusion in the research study. The precise definition of the universe
for a particular study is set by the research question, which specifies who or
what is of interest. The universe may be individuals, groups of people,
organizations, or even objects. For example, research about voting in an
upcoming election would have a universe comprising all voters.
Sample: A sample refers to a smaller, manageable version of a larger group. It
is a subset containing the characteristics of a larger population. Samples are
used in statistical testing when population sizes are too large for the test to
include all possible members or observations. A sample should represent the
population as a whole and not reflect any bias toward a specific attribute.

When you conduct research about a group of people, it’s rarely possible to
collect data from every person in that group. Instead, you select a sample. The
sample is the group of individuals who will actually participate in the research.
To draw valid conclusions from your results, you have to carefully decide how
you will select a sample that is representative of the group as a whole.
Sampling is the process of selecting units (e.g., people, organizations) from a
population of interest so that by studying the sample we may fairly generalize our
results back to the population from which they were chosen.
The population is the entire group that you want to draw conclusions about
whereas the sample is the specific group of individuals that you will collect data
from.

A good sample is one which satisfies all or few of the following conditions-
(i) Representativeness: When sampling method is adopted by the researcher,
the basic assumption is that the samples so selected out of the population are the
best representative of the population under study. Thus good samples are those
who accurately represent the population. Probability sampling technique yield
representative samples. On measurement terms, the sample must be valid. The
validity of a sample depends upon its accuracy.
(ii) Accuracy: Accuracy is defined as the degree to which bias is absent from the
sample. An accurate (unbiased) sample is one which exactly represents the
population. It is free from any influence that causes any differences between
sample value and population value.
(iii) Size: A good sample must be adequate in size and reliable. The sample size
should be such that the inferences drawn from the sample are accurate to a given
level of confidence to represent the entire population under study.
SAMPLING FRAME
A sampling frame is a list of all the items in your population. It’s a
complete list of everyone or everything you want to study. The
difference between a population and a sampling frame is that the
population is general and the frame is specific.

For example, the researcher wants to study the eating habits of the
school students from class I to class V in South Delhi. The sampling
frame would be all the students enrolled in all the schools of South
Delhi from class I to V. The attendance registers enlisting the names
of all the students would comprise the sampling frame.
Qualities of a Good Sampling Frame
Care must be taken to make sure your sampling frame is
adequate for your needs. A good sample frame for a project on
living conditions would:
Include all individuals in the target population.
Exclude all individuals not in the target population.
Includes accurate information that can be used to contact selected
individuals.
Other general factors that you would want to make sure you have:
An unique identifier for each member. This could be a simple
numerical identifier (i.e. from 1 to 1000). Check to make sure
there are no duplicates in the frame.
A logical organization to the list. For example, put them in
alphabetical order.
Up to date information. This may need to be periodically checked
(i.e. for address changes).
SAMPLING ERROR
A sampling error is a statistical error that occurs when an analyst does not
select a sample that represents the entire population of data and the
results found in the sample do not represent the results that would be
obtained from the entire population.
Sampling error can be eliminated when the sample size is increased and
also by ensuring that the sample adequately represents the entire
population.
An Example of Sampling Error
Let’s pretend that we are a group of researchers administering a survey
with the goal of learning how much money a specific group of people
spends while purchasing a vehicle.
To kickstart the study, we distribute our survey to 1,000 randomly selected
United States residents.
By dumb luck, respondent #347 happens to be Mark Cuban — billionaire
businessman and investor. While it’s unlikely that someone with the status
of Mark Cuban would complete our survey, it’s still possible.
Let’s also say that, again by chance, respondent #789 is Elon Musk —
another billionaire. He also decides to fill out our survey.
While interested in something directly related to a person’s
income, such as how much individuals spend while purchasing
a vehicle, by chance we put ourselves at risk of collecting data
from significant outliers of the population.
In this case, billionaire businessmen Mark Cuban and Elon Musk
do not accurately represent average members of the target
population we are interested in, and therefore the accuracy of
our results would be negatively affected.
The same goes for if we were to collect a significant amount of
data from individuals that fall below the poverty line.
If too many of our respondents are either too wealthy or
struggling financially, our sample will look different than the
true nature of the real-world population.
This difference is the sampling error.
NON SAMPLING ERROR
A non-sampling error is an error that results during data collection,
causing the data to differ from the true values. Non-sampling error
differs from sampling error.
A sampling error is limited to any differences between sample values
and universe values that arise because the entire universe was not
sampled. Sampling error can result even when no mistakes of any kind
are made. The “errors” result from the mere fact that data in a sample is
unlikely to perfectly match data in the universe from which the sample is
taken. This “error” can be minimized by increasing the sample size.
Non-sampling errors cover all other discrepancies, including those that
arise from a poor sampling technique.
Sources of non-sampling errors: Non sampling errors can occur at every
stage of planning and execution of survey or census. It occurs at
planning stage, field work stage as well as at tabulation and
computation stage.
The main sources of the non sampling errors are
lack of proper specification of the domain of study and scope of
investigation,
incomplete coverage of the population or sample,
faulty definition,
defective methods of data collection and
tabulation errors
Non-sampling errors can include but are not limited to, data entry
errors, biased survey questions, biased processing/decision
making, non-responses, inappropriate analysis conclusions and
false information provided by respondents.
While increasing sample size will help minimize sampling error, it
will not have any effect on reducing non-sampling error.
Unfortunately, non-sampling errors are often difficult to detect,
and it is virtually impossible to eliminate them entirely.
METHODS TO REDUCE SAMPLING ERROR

Of the two types of errors, sampling error is easier to identify. The biggest
techniques for reducing sampling error are:
(i) Increase the sample size.
A larger sample size leads to a more precise result because the study gets
closer to the actual population size.
(ii) Divide the population into groups.
Instead of a random sample, test groups according to their size in the
population. For example, if people of a certain demographic make up 35%
of the population, make sure 35% of the study is made up of this variable.
(iii) Know your population.
The error of population specification is when a research team selects an
inappropriate population to obtain data. Know who buys your product, uses
it, works with you, and so forth. With basic socio-economic information, it is
possible to reach a consistent sample of the population. In cases like
marketing research, studies often relate to one specific population like
Facebook users, Millenials, or even homeowners.
METHODS TO REDUCE NON SAMPLING ERRORS

(i) Thoroughly Pretest your Survey Mediums


People are much more likely to ignore survey requests if
loading times are long, questions do not fit properly on their
screens, or they have to work to make the survey compatible
with their device. The best advice is to acknowledge your
sample`s different forms of communication software and
devices and pre-test your surveys and invites on each,
ensuring your survey runs smoothly for all your respondents.

(ii) Avoid Rushed or Short Data Collection Periods


One of the worst things a researcher can do is limit their data
collection time in order to comply with a strict deadline. Your
study’s level of nonresponse bias will climb dramatically if you
are not flexible with the time frames respondents have to
answer your survey.
iii) Send Reminders to Potential Respondents
Sending a few reminder emails throughout your data collection
period has been shown to effectively gather more completed
responses. It is best to send your first reminder email midway
through the collection period and the second near the end of the
collection period.

iv) Ensure Confidentiality


Any survey that requires information that is personal in nature
should include reassurance to respondents that the data collected
will be kept completely confidential.

v) Use Incentives
Many people refuse to respond to surveys because they feel they
do not have the time to spend answering questions. An incentive is
usually necessary to motivate people into taking part in your study.
SAMPLE SIZE CONSTRAINTS
Effects of Small Sample Size
In the formula, the sample size is directly proportional to Z-score and inversely
proportional to the margin of error. Consequently, reducing the sample size
reduces the confidence level of the study, which is related to the Z-score.
Decreasing the sample size also increases the margin of error.
In short, when researchers are constrained to a small sample size for economic
or logistical reasons, they may have to settle for less conclusive results.
Whether or not this is an important issue depends ultimately on the size of the
effect they are studying.
For example, a small sample size would give more meaningful results in a poll
of people living near an airport who are affected negatively by air traffic than it
would in a poll of their education levels.
Effect of Large Sample Size
There is a widespread belief that large samples are ideal for research or
statistical analysis. However, this is not always true. Using the above example
as a case study, very large samples that exceed the value estimated by
sample size calculation present different hurdles.
The first such hurdle is ethical. Should a study be performed with
more patients than necessary? This means that more people than
needed are exposed to the new therapy. Potentially, this implies
increased hassle and risk.
The second obstacle is that the use of a larger number of cases can
also involve more financial and human resources than necessary to
obtain the desired response.
NON RESPONSE
Non response happens when there is a significant difference between
those who responded to your survey and those who did not. This may
happen for a variety of reasons, including:
Some people refused to participate. This could be because you are asking
for embarrassing information, or information about illegal activities.
Poorly constructed surveys. For example, if you have a snail mail survey
for young adults or a smartphone survey for older adults; both these
scenarios are likely to lead to a lower response rate for your targeted
population.
Some people simply forgot to return the survey.
Your survey didn’t reach all members in your sample. For example, email
invites might have disappeared into the Spam folder, or the code used in
the email may not have rendered properly on certain devices (like cell
phones).
Certain groups were more inclined to answer. For example, people who
are more active runners might be more inclined to answer a survey about
running than people who aren’t as active in the community.
Non response bias is introduced bias in statistics when respondents
differ from non respondents. In other words, it will throw your results
off or invalidate them completely. It can also result in higher variances
for the estimates, as the sample size you end up with is smaller than
the one you originally had in mind.

Tips for Avoiding Non Response Bias


Design your survey carefully; use well-trained staff and proven
techniques.
Develop a relationship with respondents. People who have a
connection with your cause are more likely to respond to surveys.
Send reminders to respond.
Offer incentives to respond.
Keep surveys short. A one minute survey is going to have a higher
response rate than a 15 minute survey.
Make sure the respondents aware aware that any information given is
completely confidential, or anonymous. The more sensitive the
questions, the more important this factor can be.
PROBABILITY SAMPLING
Probability sampling is based on the fact that every member of a population has
a known and equal chance of being selected.
Probability Sampling is a sampling technique in which sample from a larger
population are chosen using a method based on the theory of probability. For a
participant to be considered as a probability sample, he/she must be selected
using a random selection.
For example, if you have a population of 100 people every person would have
odds of 1 in 100 for getting selected. Probability sampling gives you the best
chance to create a sample that is truly representative of the population.

Types of Probability Sampling


Simple random sampling
Systematic Sampling
Cluster Random Sampling
Stratified Random Sampling
Area Sampling
Simple Random Sampling: Simple random sampling as the name suggests is
a completely random method of selecting the sample. This sampling method is
as easy as assigning numbers to the individuals (sample) and then randomly
choosing from those numbers through an automated process. e.g: lottery system
Advantages of Simple Random Sampling
1.If applied appropriately, simple random sampling is associated with the
minimum amount of sampling bias compared to other sampling methods.
2.Given the large sample frame is available, the ease of forming the sample
group i.e. selecting samples is one of the main advantages of simple random
sampling.
3.Research findings resulting from the application of simple random sampling can
be generalized due to representativeness of this sampling technique and a little
relevance of bias.

Systematic Sampling: Systematic Sampling is when you choose every “nth”


individual to be a part of the sample. For example, you can choose every 5th
person to be in the sample. Systematic sampling is an extended implementation
of the same old probability technique in which each member of the group is
selected at regular periods to form a sample. There’s an equal opportunity for
every member of a population to be selected using this sampling technique.
Cluster Random Sampling: With cluster sampling, the researcher divides
the population into separate groups, called clusters. Then, a simple random
sample of clusters is selected from the population. The researcher conducts
his analysis on data from the sampled clusters.
Essentially, each cluster is a mini-representation of the entire population.
Advantages of Cluster Sampling
1. Requires fewer resources: Since cluster sampling selects only certain
groups from the entire population, the method requires fewer resources for
the sampling process.
2. More feasible: The division of the entire population into homogenous
groups increases the feasibility of the sampling. Additionally, since each
cluster represents the entire population, more subjects can be included in the
study.
Disadvantages of Cluster Sampling
1. Biased samples: Cluster sampling is prone to biases. If the clusters that
represent the entire population were formed under a biased opinion,
the inferences about the entire population would be biased as well.
2. High sampling error: Generally, the samples drawn using the cluster
sampling method are prone to higher sampling error than the samples formed
using other sampling methods.
Stratified Random Sampling: Stratified Random sampling involves a
method where a larger population can be divided into smaller groups that
usually don’t overlap but represent the entire population together. While
sampling these groups can be organized and then draw a sample from each
group separately.
A common method is to arrange or classify by sex, age, ethnicity and similar
ways. Splitting subjects into mutually exclusive groups and then using simple
random sampling to choose members from groups.
Difference Between Cluster Sampling and Stratified Sampling
For a stratified random sample, a population is divided into stratum, or sub-
populations, before sampling. At first glance, the two techniques seem very
similar. However, in cluster sampling the actual cluster is the sampling unit;
in stratified sampling, analysis is done on elements within each strata. In
cluster sampling, a researcher will only study selected clusters; with stratified
sampling, a random sample is drawn from each strata.

Area Sampling: Area sampling is a special form of cluster sampling in which


the sample items are clustered on a geographic area basis. For example, if
one wanted to measure candy sales in retail stores, one might choose a
sample of city blocks, and then audit sales of all retail outlets on those
sample blocks.
NON PROBABILITY SAMPLING
Non-probability sampling is a sampling technique in which the
researcher selects samples based on the subjective judgment of the
researcher rather than random selection.
In non-probability sampling, not all members of the population have a
chance of participating in the study unlike probability sampling, where
each member of the population has a known chance of being
selected.

Types of Non-Probability Sampling:


Judgmental or Purposive Sampling,
Convenience Sampling,
Quota Sampling,
Snowball Sampling,
Consecutive Sampling
Judgmental or Purposive Sampling: In judgmental sampling, the
samples are selected based purely on researcher’s knowledge and
credibility. In other words, researchers choose only those who he feels are a
right fit (with respect to attributes and representation of a population) to
participate in research study.
This is not a scientific method of sampling and the downside to this
sampling technique is that the results can be influenced by the
preconceived notions of a researcher. Thus, there is a high amount of
ambiguity involved in this research technique.

Convenience Sampling: Convenience sampling is a non-probability


sampling technique where samples are selected from the population only
because they are conveniently available to researcher. These samples are
selected only because they are easy to recruit and researcher did not
consider selecting sample that represents the entire population.
Ideally, in research, it is good to test sample that represents the population.
But, in some research, the population is too large to test and consider the
entire population. This is one of the reasons, why researchers rely on
convenience sampling, which is the most common non-probability sampling
technique, because of its speed, cost-effectiveness, and ease of availability
of the sample.
Quota Sampling: Quota sampling means to take a very tailored sample that’s
in proportion to some characteristic or trait of a population. For example, you
could divide a population by the state they live in, income or education level, or
sex. The population is divided into groups (also called quota) and samples are
taken from each group to meet a quota. Care is taken to maintain the correct
proportions representative of the population. For example, if your population
consists of 45% female and 55% male, your sample should reflect those
percentages. Quota sampling is based on the researcher’s judgment and is
considered a non-probability sampling technique.

Snowball Sampling: This is a sampling technique, in which existing subjects


provide referrals to recruit samples required for a research study. This sampling
method involves a primary data source nominating other potential data sources
that will be able to participate in the research studies. Snowball sampling
method is purely based on referrals and that is how a researcher is able to
generate a sample. Therefore this method is also called the chain-referral
sampling method
Consider hypothetically, you as a researcher are studying the homeless in
Texas City. It is obviously difficult to find a list of all the details of the number of
homeless there. However, you are able to identify one or two homeless
individuals who are willing to participate in your research studies. Now, these
homeless individuals provide you with the details of other homeless individuals
they know.
Consecutive Sampling: Consecutive sampling is defined as
a non-probability sampling technique where samples are picked
at the ease of a researcher more like convenience sampling,
only with a slight variation. Here, the researcher picks
a sample or group of people and conduct research over a
period of time, collect results, and then moves on to another
sample.
This sampling technique gives the researcher a chance to work
with multiple samples to fine tune his/her research work to
collect vital research insights.
SAMPLE SIZE DETERMINATION

When you survey a large population of respondents, you’re interested in the


entire group, but it’s not realistically possible to get answers or results from
absolutely everyone. So you take a random sample of individuals which
represents the population as a whole.
The size of the sample is very important for getting accurate, statistically
significant results and running your study successfully.
•If your sample is too small, you may include a disproportionate number of
individuals which are outliers and anomalies. These skew the results and you
don’t get a fair picture of the whole population.
•If the sample is too big, the whole study becomes complex, expensive and
time-consuming to run, and although the results are more accurate, the benefits
don’t outweigh the costs.
What are the terms used around the sample size?
Before we jump into sample size determination, let’s take a look at the terms you should know:

1.Population size: Population size is how many people fit your demographic. For example,
you want to get information on doctors residing in North America. Your population size is the
total number of doctors in North America. Your population size doesn’t always have to be that
big. Smaller population sizes can still give you accurate results as long as you know who
you’re trying to represent.

2.Confidence level: Confidence level tells you how sure you can be that your data is
accurate. It is expressed as a percentage and aligned to the confidence interval. For example,
if your confidence level is 90%, your results will most likely be 90% accurate. The most
common confidence intervals are 90% confident, 95% confident, and 99% confident.

3.The margin of error (confidence interval): When it comes to surveys, there’s no way to
be 100% accurate. Confidence intervals tell you how far off from the population means you’re
willing to allow your data to fall. A margin of error describes how close you can reasonably
expect a survey result to fall relative to the real population value. If you’ve ever seen a
political poll on the news, you’ve seen a confidence interval and how it’s expressed. It will
look something like this: “68% of voters said yes to Proposition Z, with a margin of error of +/-
5%.”
4. Standard deviation: Standard deviation is the measure of the dispersion of
a data set from its mean. It measures the absolute variability of a distribution.
The higher the dispersion or variability, the greater the standard deviation and
the greater the magnitude of the deviation. For example, you have already sent
out your survey. How much variance do you expect in your responses? That
variation in response is the standard of deviation. (A standard deviation of 0.5 is
a safe choice where the figure is unknown)

5. Find your Z-score


Next, you need to turn your confidence level into a Z-score. Here are the Z-scores
for the most common confidence levels:
•90% – Z Score = 1.645
•95% – Z Score = 1.96
•99% – Z Score = 2.576
If you chose a different confidence level, use the Z-score table
Sample size calculation formula

Necessary Sample Size = (Z-score)2 * StdDev*(1-StdDev) /


(margin of error)2

Here’s a worked example, assuming you chose a 95% confidence level, .5


standard deviation, and a margin of error (confidence interval) of +/- 5%.
((1.96)2 x 0.5(0.5)) / (.05)2
(3.8416 x 0.25) / .0025
.9604 / .0025
384.16
385 respondents are needed

You might also like