Chapter 2 Sampling and Data Collection
Chapter 2 Sampling and Data Collection
2
Primary Secondary
Data Collection Data Compilation
Print or Electronic
Observation Survey
Experimentation
I. Methods of collecting data
1. Observation
The
investigator observes characteristics of a subset of
members of one or more existing population.
Goal: draw conclusions about the corresponding
population or about the difference between two or more
populations.
Advantage vs Disadvantage
o Advantage: easy to conduct, relatively inexpensive
o Disadvantage: provide little useful information;
impossible to draw cause-and-effect conclusions due to
confounding variable
4
Observation
Example
A researcher for a pharmaceutical company
wants to determine whether aspirin does
reduce the incidence of heart attacks. He
select a sample of men and women and asking
each whether he or she has taken aspirin
regularly over the past 2 years. Each person
would be asked whether he or she had
suffered a heart attack over the same period.
The proportions reporting heart attacks would
be compared and a conclusion can be drawn
whether aspirin is effective in reducing the
likelihood of heart attacks.
5
I. Methods of collecting data
2. Experiment
The investigator observes how a response variable
behaves when the researcher manipulates one or more
explanatory variables (factors).
Goal: determine the effect of the manipulated factors
on the response variable
Advantage vs Disadvantage
o Advantage: provide useful data particularly for cause-
and-effect relationship
o Disadvantage: relatively expensive, time required.
6
Experiment
Example
A researcher for a pharmaceutical company
wants to determine whether aspirin does
reduce the incidence of heart attacks. He
select a sample of men and women. The
sample would be divided into two groups: one
group would take aspirin regularly and the
other would not. After 2 years, the researcher
would determine the proportion of people in
each group who had suffered a heart attack.
Then, it is possible to draw conclusion
whether aspirin is effective in reducing the
likelihood of heart attacks.
7
I. Methods of collecting data
3. Survey
One of the most familiar methods of collecting data
Goal: Used to solicit information from people concerning
things as income, family size, opinions on various issues…
The majority of surveys are conducted for private use
Examples:
o market researchers conduct a survey to determine the
preferences and attitudes of consumers which will help
target a new product;
o A company surveys customers’ satisfaction on their
products and service.
8
SURVEY
TELEPHONE
PERSONAL INTERVIEW MAIL SURVEY
INTERVIEW
- Inexpensive
- High rate of
- Less expensive - Low response
response, fewer
- Less personal, rate, high
incorrect answers
lower response number of
- Costly: people,
rate incorrect
money, time…
answers
9
Define the issue
what are the purpose and objectives of the survey
Identify the questions to answer?
Deciding what to measure and how to measure
Decide what information needed to answer questions
Think about how you intend to tabulate and analyze
the response
12
Close-ended Questions
* Select from a short list of defined choices
Example: Major: __business __liberal arts
__science __other
Open-ended Questions
* Respondents are free to respond with any value, words, or
statement
Example: What did you like best about this course?
Demographic Questions
* Questions about the respondents’ personal characteristics
Example: Gender: __Female __ Male
Types of Questions
13
II. SAMPLING METHODS
1/ Why Sampling
- Less time consuming than a census
14
POPULATION VS SAMPLE
15
2/ Methods of Sampling
Probability Samples
Simple
Stratified Systematic Cluster
Random Random
16
Simple Random Samples
Every individual or item from the population
has an equal chance of being selected
Selection may be with replacement or
without replacement
Samples can be obtained from a table of
random numbers or computer random number
generators
17
Stratified Random Samples
Population divided into subgroups (called
strata) according to some common characteristic
Simple random sample selected from each
subgroup
Samples from subgroups are combined into one
Population
Divided
into 4
strata
18 Sample
Systematic Samples
Decide on sample size: n
Divide frame of N individuals into groups of k
individuals: k=N/n
Randomly select one individual from the 1st
group
Select every kth individual thereafter
N = 64
n=8 First Group
k=8 19
Cluster Samples
*Population is divided into several “clusters,”
each representative of the population
*A simple random sample of clusters is selected
* All items in the selected clusters can be used, or items can be
chosen from a cluster using another probability sampling
technique
Population
divided into
16 clusters. Randomly selected
clusters for sample
20
CONVENIENT SAMPLING
21
III. SAMPLING AND NON-SAMPLING ERROR
1/ Sampling Error
- An error expected to occur when making statement
about the population that is based on the observations
contained in a sample taken from the population.
23
III. SAMPLING AND NON-SAMPLING ERROR
1/ Non-Sampling Error
Selection Bias
An error occur
when there are
mistakes in the
acquisition of Measurement or
the data or due response bias
to the sample
observations
being selected
improperly.
Nonresponse Bias
24
SELECTION BIAS
25
MEASUREMENT OR RESPONSE BIAS
27
Case study
In summer 1936, the Literary Digest magazine wanted to
predict the next US president, just as they had successfully
done five times before.
They sent out postcards to 10 million Americans and then
announced that Alfred M. Landon, then governor of Kansas,
would gain 57% percent of the popular vote and, thus,
demolish Franklin D. Roosevelt, the incumbent president.
In fact, Roosevelt won by a landslide never before seen in
U.S. history. He garnered not the predicted 43%, but 62.5%
of the popular vote and all but 8 of 531 electoral votes.
The Digest never survived the debacle and folded shortly
thereafter.
What had gone wrong?
28
Case analysis
1 Sample selection
2 Response
percentage
THANK YOU!
30