Unit IV
Unit IV
When you conduct research about a group of people, it’s rarely possible to
collect data from every person in that group. Instead, you select a sample. The
sample is the group of individuals who will actually participate in the research.
To draw valid conclusions from your results, you have to carefully decide how
you will select a sample that is representative of the group as a whole.
Sampling is the process of selecting units (e.g., people, organizations) from a
population of interest so that by studying the sample we may fairly generalize our
results back to the population from which they were chosen.
The population is the entire group that you want to draw conclusions about
whereas the sample is the specific group of individuals that you will collect data
from.
A good sample is one which satisfies all or few of the following conditions-
(i) Representativeness: When sampling method is adopted by the researcher,
the basic assumption is that the samples so selected out of the population are the
best representative of the population under study. Thus good samples are those
who accurately represent the population. Probability sampling technique yield
representative samples. On measurement terms, the sample must be valid. The
validity of a sample depends upon its accuracy.
(ii) Accuracy: Accuracy is defined as the degree to which bias is absent from the
sample. An accurate (unbiased) sample is one which exactly represents the
population. It is free from any influence that causes any differences between
sample value and population value.
(iii) Size: A good sample must be adequate in size and reliable. The sample size
should be such that the inferences drawn from the sample are accurate to a given
level of confidence to represent the entire population under study.
SAMPLING FRAME
A sampling frame is a list of all the items in your population. It’s a
complete list of everyone or everything you want to study. The
difference between a population and a sampling frame is that the
population is general and the frame is specific.
For example, the researcher wants to study the eating habits of the
school students from class I to class V in South Delhi. The sampling
frame would be all the students enrolled in all the schools of South
Delhi from class I to V. The attendance registers enlisting the names
of all the students would comprise the sampling frame.
Qualities of a Good Sampling Frame
Care must be taken to make sure your sampling frame is
adequate for your needs. A good sample frame for a project on
living conditions would:
Include all individuals in the target population.
Exclude all individuals not in the target population.
Includes accurate information that can be used to contact selected
individuals.
Other general factors that you would want to make sure you have:
An unique identifier for each member. This could be a simple
numerical identifier (i.e. from 1 to 1000). Check to make sure
there are no duplicates in the frame.
A logical organization to the list. For example, put them in
alphabetical order.
Up to date information. This may need to be periodically checked
(i.e. for address changes).
SAMPLING ERROR
A sampling error is a statistical error that occurs when an analyst does not
select a sample that represents the entire population of data and the
results found in the sample do not represent the results that would be
obtained from the entire population.
Sampling error can be eliminated when the sample size is increased and
also by ensuring that the sample adequately represents the entire
population.
An Example of Sampling Error
Let’s pretend that we are a group of researchers administering a survey
with the goal of learning how much money a specific group of people
spends while purchasing a vehicle.
To kickstart the study, we distribute our survey to 1,000 randomly selected
United States residents.
By dumb luck, respondent #347 happens to be Mark Cuban — billionaire
businessman and investor. While it’s unlikely that someone with the status
of Mark Cuban would complete our survey, it’s still possible.
Let’s also say that, again by chance, respondent #789 is Elon Musk —
another billionaire. He also decides to fill out our survey.
While interested in something directly related to a person’s
income, such as how much individuals spend while purchasing
a vehicle, by chance we put ourselves at risk of collecting data
from significant outliers of the population.
In this case, billionaire businessmen Mark Cuban and Elon Musk
do not accurately represent average members of the target
population we are interested in, and therefore the accuracy of
our results would be negatively affected.
The same goes for if we were to collect a significant amount of
data from individuals that fall below the poverty line.
If too many of our respondents are either too wealthy or
struggling financially, our sample will look different than the
true nature of the real-world population.
This difference is the sampling error.
NON SAMPLING ERROR
A non-sampling error is an error that results during data collection,
causing the data to differ from the true values. Non-sampling error
differs from sampling error.
A sampling error is limited to any differences between sample values
and universe values that arise because the entire universe was not
sampled. Sampling error can result even when no mistakes of any kind
are made. The “errors” result from the mere fact that data in a sample is
unlikely to perfectly match data in the universe from which the sample is
taken. This “error” can be minimized by increasing the sample size.
Non-sampling errors cover all other discrepancies, including those that
arise from a poor sampling technique.
Sources of non-sampling errors: Non sampling errors can occur at every
stage of planning and execution of survey or census. It occurs at
planning stage, field work stage as well as at tabulation and
computation stage.
The main sources of the non sampling errors are
lack of proper specification of the domain of study and scope of
investigation,
incomplete coverage of the population or sample,
faulty definition,
defective methods of data collection and
tabulation errors
Non-sampling errors can include but are not limited to, data entry
errors, biased survey questions, biased processing/decision
making, non-responses, inappropriate analysis conclusions and
false information provided by respondents.
While increasing sample size will help minimize sampling error, it
will not have any effect on reducing non-sampling error.
Unfortunately, non-sampling errors are often difficult to detect,
and it is virtually impossible to eliminate them entirely.
METHODS TO REDUCE SAMPLING ERROR
Of the two types of errors, sampling error is easier to identify. The biggest
techniques for reducing sampling error are:
(i) Increase the sample size.
A larger sample size leads to a more precise result because the study gets
closer to the actual population size.
(ii) Divide the population into groups.
Instead of a random sample, test groups according to their size in the
population. For example, if people of a certain demographic make up 35%
of the population, make sure 35% of the study is made up of this variable.
(iii) Know your population.
The error of population specification is when a research team selects an
inappropriate population to obtain data. Know who buys your product, uses
it, works with you, and so forth. With basic socio-economic information, it is
possible to reach a consistent sample of the population. In cases like
marketing research, studies often relate to one specific population like
Facebook users, Millenials, or even homeowners.
METHODS TO REDUCE NON SAMPLING ERRORS
v) Use Incentives
Many people refuse to respond to surveys because they feel they
do not have the time to spend answering questions. An incentive is
usually necessary to motivate people into taking part in your study.
SAMPLE SIZE CONSTRAINTS
Effects of Small Sample Size
In the formula, the sample size is directly proportional to Z-score and inversely
proportional to the margin of error. Consequently, reducing the sample size
reduces the confidence level of the study, which is related to the Z-score.
Decreasing the sample size also increases the margin of error.
In short, when researchers are constrained to a small sample size for economic
or logistical reasons, they may have to settle for less conclusive results.
Whether or not this is an important issue depends ultimately on the size of the
effect they are studying.
For example, a small sample size would give more meaningful results in a poll
of people living near an airport who are affected negatively by air traffic than it
would in a poll of their education levels.
Effect of Large Sample Size
There is a widespread belief that large samples are ideal for research or
statistical analysis. However, this is not always true. Using the above example
as a case study, very large samples that exceed the value estimated by
sample size calculation present different hurdles.
The first such hurdle is ethical. Should a study be performed with
more patients than necessary? This means that more people than
needed are exposed to the new therapy. Potentially, this implies
increased hassle and risk.
The second obstacle is that the use of a larger number of cases can
also involve more financial and human resources than necessary to
obtain the desired response.
NON RESPONSE
Non response happens when there is a significant difference between
those who responded to your survey and those who did not. This may
happen for a variety of reasons, including:
Some people refused to participate. This could be because you are asking
for embarrassing information, or information about illegal activities.
Poorly constructed surveys. For example, if you have a snail mail survey
for young adults or a smartphone survey for older adults; both these
scenarios are likely to lead to a lower response rate for your targeted
population.
Some people simply forgot to return the survey.
Your survey didn’t reach all members in your sample. For example, email
invites might have disappeared into the Spam folder, or the code used in
the email may not have rendered properly on certain devices (like cell
phones).
Certain groups were more inclined to answer. For example, people who
are more active runners might be more inclined to answer a survey about
running than people who aren’t as active in the community.
Non response bias is introduced bias in statistics when respondents
differ from non respondents. In other words, it will throw your results
off or invalidate them completely. It can also result in higher variances
for the estimates, as the sample size you end up with is smaller than
the one you originally had in mind.
1.Population size: Population size is how many people fit your demographic. For example,
you want to get information on doctors residing in North America. Your population size is the
total number of doctors in North America. Your population size doesn’t always have to be that
big. Smaller population sizes can still give you accurate results as long as you know who
you’re trying to represent.
2.Confidence level: Confidence level tells you how sure you can be that your data is
accurate. It is expressed as a percentage and aligned to the confidence interval. For example,
if your confidence level is 90%, your results will most likely be 90% accurate. The most
common confidence intervals are 90% confident, 95% confident, and 99% confident.
3.The margin of error (confidence interval): When it comes to surveys, there’s no way to
be 100% accurate. Confidence intervals tell you how far off from the population means you’re
willing to allow your data to fall. A margin of error describes how close you can reasonably
expect a survey result to fall relative to the real population value. If you’ve ever seen a
political poll on the news, you’ve seen a confidence interval and how it’s expressed. It will
look something like this: “68% of voters said yes to Proposition Z, with a margin of error of +/-
5%.”
4. Standard deviation: Standard deviation is the measure of the dispersion of
a data set from its mean. It measures the absolute variability of a distribution.
The higher the dispersion or variability, the greater the standard deviation and
the greater the magnitude of the deviation. For example, you have already sent
out your survey. How much variance do you expect in your responses? That
variation in response is the standard of deviation. (A standard deviation of 0.5 is
a safe choice where the figure is unknown)