Chapter I Introduction
Chapter I Introduction
com
Definition
A taxonomy of statistics
Data Collection Methods
programs
Federal agency statistics - Census, etc.
What did you find on the frustrating side as
you looked for data on the state’s websites?
When was it collected? For how long?
◦ May be out of date for what you want to analyze.
◦ May not have been collected long enough for
detecting trends.
◦ E.g. Have new anticorruption laws impacted
Russia’s government accountability ratings?
Are there confounding problems?
◦ Sample selection bias?
◦ Source choice bias?
◦ In time series, did some observations drop out over
time?
Are the data consistent/reliable?
◦ Did variables drop out over time?
◦ Did variables change in definition over time?
E.g. number of years of education versus highest
degree obtained.
Is the information exactly what you need?
◦ In some cases, may have to use “proxy variables” –
variables that may approximate something you
really wanted to measure.
◦ Are they reliable? Is there correlation to what you
actually want to measure?
No need to reinvent the wheel.
◦ If someone has already found the data, take
advantage of it.
It will save your money.
◦ Even if you have to pay for access, often it is
cheaper in terms of money than collecting your own
data. (more on this later.)
It will save you time.
◦ Primary data collection is very time consuming.
(More on this later, too!)
It may be very accurate.
◦ When especially a government agency has collected
the data, incredible amounts of time and money
went into it. It’s probably highly accurate.
It has great exploratory value
◦ Exploring research questions and formulating
hypothesis to test.
Primary data – data you collect
Surveys
Focus groups
Questionnaires
Personal interviews
Experiments and observational study
Do you have the time and money for:
◦ Designing your collection instrument?
◦ Selecting your population or sample?
◦ Pretesting/piloting the instrument to work out
sources of bias?
◦ Administration of the instrument?
◦ Entry/collation of data?
Uniqueness
◦ May not be able to compare to other populations
Researcher error
◦ Sample bias
◦ Other confounding factors
What you must ask yourself:
◦ Will the data answer my research question?
To answer that
◦ You much first decide what your research question
is
◦ Then you need to decide what data/variables are
needed to scientifically answer the question
If that data exist in secondary form, then use
them to the extent you can, keeping in mind
limitations.
But if it does not, and you are able to fund
30
No one best way: decision depends on:
◦ What you need to know: numbers or stories
◦ Where the data reside: environment, files, people
◦ Resources and time available
◦ Complexity of the data to be collected
◦ Frequency of data collection
◦ Intended forms of data analysis
31
Use multiple data collection methods
Use available data, but need to know
32
If must collect original data:
◦ be sensitive to burden on others
◦ pre-test, pre-test, pre-test
◦ establish procedures and follow them (protocol)
◦ maintain accurate records of definitions and coding
◦ verify accuracy of coding, data input
33
All data collected in the same way
Especially important for multi-site and cluster
34
need to address extent questions
have a large sample or population
know what needs to be measured
need to show results numerically
need to make comparisons across different
sites or interventions
35
Systematic and follow general procedures but
data are not collected in exactly the same way
every time
More open and fluid
Does not follow a rigid script
◦ may ask for more detail
◦ people can tell what they want in their own way
36
conducting exploratory work
seeking understanding, themes, and/or
issues
need narratives or stories
want in-depth, rich, “backstage”
information
seek to understand results of data that are
unexpected
37
Is the measure relevant?
Is the measure credible?
Is the measure valid?
Is the measure reliable?
38
Does the Do not measure
measure capture what is easy
what matters? instead of what is
needed
39
Is the measure believable? Will it be
viewed as a reasonable and
appropriate way to capture the
information sought?
40
How well does Are waiting lists
the measure a valid measure
capture what it is of demand?
supposed to?
41
A measure’s How reliable are:
precision and ◦ birth weights of
stability- extent newborn infants?
to which the ◦ speeds measured
same result by a stopwatch?
would be
obtained with
repeated trials
42
Data in numerical form
Data that can be precisely measured
43
Data that deal with description
Data that can be observed or self-
reported, but not always precisely
measured
Less structured, easier to develop
Can provide “rich data” — detailed and
widely applicable
Is challenging to analyze
Is labor intensive to collect
Usually generates longer reports
44
- want to conduct statistical analysis Then Use:
45
Obtrusive Unobtrusive
data collection data collection
methods that directly methods that do not
obtain information collect information
from those being directly from those
evaluated being evaluated
e.g. interviews, surveys, e.g., document
focus groups analysis, Google Earth,
observation at a
distance, trash of the
stars
46
Choice depends on the situation
Each technique is more appropriate in
47
Triangulation of methods
◦ collection of same information using different
methods
Triangulation of sources
◦ collection of same information from a variety of
sources
Triangulation of evaluators
◦ collection of same information from more than one
evaluator
48
Participatory Methods
Records and Secondary Data
Observation
Surveys and Interviews
Focus Groups
Diaries, Journals, Self-reported Checklists
Expert Judgment
Delphi Technique
Other Tools
49
Involve groups or communities heavily in data
collection
Examples:
◦ community meetings
◦ mapping
◦ transect walks
50
One of the most common participatory
methods
Must be well organized
◦ agree on purpose
◦ establish ground rules
who will speak
time allotted for speakers
format for questions and answers
51
Drawing or using existing maps
Useful tool to involve stakeholders
52
Evaluator walks around community observing
people, surroundings, and resources
Need good observation skills
Walk a transect line through a map of a
53
Examples of sources:
◦ files/records
◦ computer data bases
◦ industry or government reports
◦ other reports or prior evaluations
◦ census data and household survey data
◦ electronic mailing lists and discussion groups
◦ documents (budgets, organizational charts,
policies and procedures, maps, monitoring
reports)
◦ newspapers and television reports
54
Key issues: validity, reliability, accuracy,
response rates, data dictionaries, and
missing data rates
55
Advantages Often less expensive and faster
than collecting the original data
again
57
need direct information
trying to understand ongoing behavior
there is physical evidence, products, or
58
a. Structured: determine, before the
observation, precisely what will be observed
before the observation
59
Maps and satellite images for complex or
pinpointed regional searches
Has an Advanced version and an Earth
Outreach version
Web site for Google Earth
◦ http://earth.google.com/
60
Observation guide
◦ printed form with space to record
Recording sheet or checklist
◦ Yes/no options; tallies, rating scales
Field notes
◦ least structured, recorded in narrative, descriptive
style
61
Have more than one observer, if
feasible
Train observers so they observe the
same things
Pilot test the observation data collection
instrument
For less structured approach, have a
few key questions in mind
62
Advantages Collects data on actual vs. self-
reported behavior or perceptions. It is
real-time vs. retrospective
63
Excellent for asking people about:
◦ perceptions, opinions, ideas
Less accurate for measuring behavior
Sample should be representative of the whole
Big problem with response rates
64
Structured:
◦ Precisely worded with a range of pre-
determined responses that the respondent can
select
◦ Everyone asked exactly the same questions in
exactly the same way, given exactly the same
choices
Semi-structured
◦ Asks same general set of questions but
answers to the questions are predominantly
open-ended
65
Structured harder to develop
easier to complete
easier to analyze
more efficient when working with large numbers
Semi- easier to develop: open ended questions
structured more difficult to complete: burdensome for
people to complete as a self-administrated
questionnaire
harder to analyze but provide a richer source of
data, interpretation of open-ended responses
subject to bias
66
Telephone surveys
Self-administered questionnaires distributed
development context
In development context, often issues of
67
Literacy issues
Consider accessibility
68
Advantages Best when you want to know what
people think, believe, or perceive,
only they can tell you that
Challenges People may not accurately recall their
behavior or may be reluctant to reveal
their behavior if it is illegal or
stigmatized. What people think they
do or say they do is not always the
same as what they actually do.
69
Often semi-structured
Used to explore complex issues in depth
Forgiving of mistakes: unclear questions
can be clarified during the interview and
changed for subsequent interviews
Can provide evaluators with an intuitive
sense of the situation
70
Can be expensive, labor intensive, and time
consuming
Selective hearing on the part of the
interviewer may miss information that does
not conform to pre-existing beliefs
Cultural sensitivity: e.g., gender issues
71
Type of qualitative research where small
homogenous groups of people are brought
together to informally discuss specific topics
under the guidance of a moderator
Purpose: to identify issues and themes, not
72
language barriers are insurmountable
evaluator has little control over the situation
trust cannot be established
free expression cannot be ensured
confidentiality cannot be assured
73
Phase Action
75
Use when you want to capture information
about events in people’s daily lives
Participants capture experiences in real-time
76
Step Process
77
Cross between a questionnaire and a diary
The evaluator specifies a list of behaviors or
events and asks the respondents to complete
the checklist
Done over a period of time to capture the
event or behavior
More quantitative approach than diary
78
Advantages Can capture in-depth, detailed data that might be
otherwise forgotten
Can collect data on how people use their time
Can collect sensitive information
Supplements interviews provide richer data
79
Can be structured
Use of experts, or unstructured
one-on-one or as a Issues in selecting
panel experts
E.g., Government
task forces,
Advisory Groups
80
Establish criteria for selecting experts not
only on recognition as expert but also based
on:
areas of expertise
diverse perspectives
diverse political views
diverse technical expertise
81
Advantages Fast, relatively inexpensive
82
Enables experts to engage remotely in a
dialogue and reach consensus, often
about priorities
Experts asked specific questions; often
rank choices
Responses go to a central source, are
summarized and fed back to the experts
without attribution
Experts can agree or argue with others’
comments
Process may be iterative
83
Advantages Allows participants to remain anonymous
Is inexpensive
Is free of social pressure, personality influence,
and individual dominance
Is conducive to independent thinking
Allows sharing of information
Challenges May not be representative
Has tendency to eliminate extreme positions
Requires skill in written communication
Requires time and participant commitment
84
- scales (weight) - health testing
- tape measure tools:
i.e. blood pressure
- stop watches
- aptitude and
- chemical tests :
achievement
i.e. quality of water
tests
-citizen report
cards
85
Choose more than one data collection
technique
No “best” tool
Do not let the tool drive your work but rather
choose the right tool to address the
evaluation question
86
[email protected]
Probability samples
Simple random sampling
Stratified sampling
Systematic sampling
Cluster (area) sampling
Multistage sampling
Non probability samples
Accidental, haphazard, convenience
Modal instance
Purposive
Expert
Quota
Snowball
Heterogeneity sampling
Who = Population:
◦ all individuals of interest
◦ Road users, crashes occurred , all drives, population
What = Parameter
◦ Characteristic of population
Problem: can’t study/survey whole population
Solution: Use a sample for the “who”
◦ subset, selected from population
◦ calculate a statistic for the “what”
Probability
Samples: each member
of the population has a known
non-zero probability of being
selected
◦ Methods include random sampling,
systematic sampling, and stratified
sampling.
Nonprobability Samples: members are
selected from the population in some
nonrandom manner
◦ Methods include convenience sampling, judgment
sampling, quota sampling, and snowball sampling
Simple random sampling
Stratified sampling
Systematic sampling
Cluster (area) sampling
Multistage sampling
N = the number of cases in the sampling
frame
n = the number of cases in the sample
in each stratum
Needed to enable better representation of
http://sphweb.bumc.bu.edu/otlt/mph-mod
ules/bs/bs704_power/BS704_Power_print.ht
ml