0% found this document useful (0 votes)
34 views

Chapter I Introduction

This document provides an overview of statistics and data collection methods. It defines statistics as the science of collecting, presenting, analyzing, and interpreting data. It discusses the differences between descriptive and inferential statistics. Descriptive statistics organize and summarize sample data, while inferential statistics make inferences about populations based on sample data. The document also discusses sources of data, including primary sources like surveys and secondary sources like government databases. It covers important aspects of data collection like ensuring data quality and validity.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Chapter I Introduction

This document provides an overview of statistics and data collection methods. It defines statistics as the science of collecting, presenting, analyzing, and interpreting data. It discusses the differences between descriptive and inferential statistics. Descriptive statistics organize and summarize sample data, while inferential statistics make inferences about populations based on sample data. The document also discusses sources of data, including primary sources like surveys and secondary sources like government databases. It covers important aspects of data collection like ensuring data quality and validity.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 140

getusegne@yahoo.

com
 Definition
 A taxonomy of statistics
 Data Collection Methods

◦ Where do data come from?


 Primary sources
 Secondary sources
 Selecting and Constructing Data Collection
Instruments
 Data Collection Tools
Definition: Science of collection, presentation,
analysis, and reasonable interpretation of data.
Statistics presents a rigorous scientific method for gaining
insight into data.
For example, suppose we measure the weight of 100
patients in a study. With so many measurements, simply
looking at the data fails to provide an informative account.
However statistics can give an instant overall picture of
data based on graphical presentation or numerical
summarization irrespective to the number of data points.
Besides data summarization, another important task of
statistics is to make inference and predict relations of
variables.
 Descriptive statistics:
◦ Organize and summarize scores from samples
 Inferential statistics:
◦ Infer information about the population based on what
we know from sample data
◦ Decide if an experimental manipulation has had an
effect
Theory

Question to answer / Hypothesis to test


Design Research Study
Collect Data
(measurements, observations)
Organize and make sense of the #s
USING STATISTICS!
Depends on our goal:

Describe characteristics Test hypothesis, Make conclusions,


organize, summarize, condense data interpret data, understand relations
DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
 We’ve seen our data from lab, all nice and
collated in a database – from:
◦ Insurance companies (claims, medications,
procedures, diagnoses, etc.)
◦ Firms (demographic data, productivity data, crash
data, socio economic data etc.)
◦ Travel survey data etc
◦ Crash data
 Take a step back – if we’re starting from
scratch, how do we collect / find data?
◦ Primary data
◦ Secondary data
 Secondary data – data someone else has
collected
 County health departments
 Historical traffic volume
 Crash record at police stations
 Vital Statistics – birth, death certificates
 Private and foundation databases
 City and county governments database
 Surveillance data from state government

programs
 Federal agency statistics - Census, etc.
 What did you find on the frustrating side as
you looked for data on the state’s websites?
 When was it collected? For how long?
◦ May be out of date for what you want to analyze.
◦ May not have been collected long enough for
detecting trends.
◦ E.g. Have new anticorruption laws impacted
Russia’s government accountability ratings?
 Are there confounding problems?
◦ Sample selection bias?
◦ Source choice bias?
◦ In time series, did some observations drop out over
time?
 Are the data consistent/reliable?
◦ Did variables drop out over time?
◦ Did variables change in definition over time?
 E.g. number of years of education versus highest
degree obtained.
 Is the information exactly what you need?
◦ In some cases, may have to use “proxy variables” –
variables that may approximate something you
really wanted to measure.
◦ Are they reliable? Is there correlation to what you
actually want to measure?
 No need to reinvent the wheel.
◦ If someone has already found the data, take
advantage of it.
 It will save your money.
◦ Even if you have to pay for access, often it is
cheaper in terms of money than collecting your own
data. (more on this later.)
 It will save you time.
◦ Primary data collection is very time consuming.
(More on this later, too!)
 It may be very accurate.
◦ When especially a government agency has collected
the data, incredible amounts of time and money
went into it. It’s probably highly accurate.
 It has great exploratory value
◦ Exploring research questions and formulating
hypothesis to test.
 Primary data – data you collect
 Surveys
 Focus groups
 Questionnaires
 Personal interviews
 Experiments and observational study
 Do you have the time and money for:
◦ Designing your collection instrument?
◦ Selecting your population or sample?
◦ Pretesting/piloting the instrument to work out
sources of bias?
◦ Administration of the instrument?
◦ Entry/collation of data?
 Uniqueness
◦ May not be able to compare to other populations
 Researcher error
◦ Sample bias
◦ Other confounding factors
 What you must ask yourself:
◦ Will the data answer my research question?
 To answer that
◦ You much first decide what your research question
is
◦ Then you need to decide what data/variables are
needed to scientifically answer the question
 If that data exist in secondary form, then use
them to the extent you can, keeping in mind
limitations.
 But if it does not, and you are able to fund

primary collection, then it is the method of


choice.
 Data Collection Strategies
 Characteristics of Good Measures
 Quantitative and Qualitative Data
 Tools for Collecting Data

30
 No one best way: decision depends on:
◦ What you need to know: numbers or stories
◦ Where the data reside: environment, files, people
◦ Resources and time available
◦ Complexity of the data to be collected
◦ Frequency of data collection
◦ Intended forms of data analysis

31
 Use multiple data collection methods
 Use available data, but need to know

◦ how the measures were defined


◦ how the data were collected and cleaned
◦ the extent of missing data
◦ how accuracy of the data was ensured

32
 If must collect original data:
◦ be sensitive to burden on others
◦ pre-test, pre-test, pre-test
◦ establish procedures and follow them (protocol)
◦ maintain accurate records of definitions and coding
◦ verify accuracy of coding, data input

33
 All data collected in the same way
 Especially important for multi-site and cluster

evaluations so you can compare


 Important when you need to make

comparisons with alternate interventions

34
 need to address extent questions
 have a large sample or population
 know what needs to be measured
 need to show results numerically
 need to make comparisons across different

sites or interventions

35
 Systematic and follow general procedures but
data are not collected in exactly the same way
every time
 More open and fluid
 Does not follow a rigid script
◦ may ask for more detail
◦ people can tell what they want in their own way

36
 conducting exploratory work
 seeking understanding, themes, and/or

issues
 need narratives or stories
 want in-depth, rich, “backstage”

information
 seek to understand results of data that are

unexpected

37
 Is the measure relevant?
 Is the measure credible?
 Is the measure valid?
 Is the measure reliable?

38
Does the Do not measure
measure capture what is easy
what matters? instead of what is
needed

39
Is the measure believable? Will it be
viewed as a reasonable and
appropriate way to capture the
information sought?

40
How well does Are waiting lists
the measure a valid measure
capture what it is of demand?
supposed to?

41
A measure’s How reliable are:
precision and ◦ birth weights of
stability- extent newborn infants?
to which the ◦ speeds measured
same result by a stopwatch?
would be
obtained with
repeated trials

42
 Data in numerical form
 Data that can be precisely measured

◦ age, cost, length, height, area, volume, weight,


speed, time, and temperature
 Harder to develop
 Easier to analyze

43
 Data that deal with description
 Data that can be observed or self-
reported, but not always precisely
measured
 Less structured, easier to develop
 Can provide “rich data” — detailed and
widely applicable
 Is challenging to analyze
 Is labor intensive to collect
 Usually generates longer reports

44
- want to conduct statistical analysis Then Use:

- want to be precise Quantitative


- know what you want to measure
- want to cover a large group
- want narrative or in-depth information
-are not sure what you are able to measure Qualitative
- do not need to quantify the results

45
Obtrusive Unobtrusive
data collection data collection
methods that directly methods that do not
obtain information collect information
from those being directly from those
evaluated being evaluated
e.g. interviews, surveys, e.g., document
focus groups analysis, Google Earth,
observation at a
distance, trash of the
stars

46
 Choice depends on the situation
 Each technique is more appropriate in

some situations than others


 Caution: All techniques are subject to bias

47
 Triangulation of methods
◦ collection of same information using different
methods
 Triangulation of sources
◦ collection of same information from a variety of
sources
 Triangulation of evaluators
◦ collection of same information from more than one
evaluator

48
 Participatory Methods
 Records and Secondary Data
 Observation
 Surveys and Interviews
 Focus Groups
 Diaries, Journals, Self-reported Checklists
 Expert Judgment
 Delphi Technique
 Other Tools

49
 Involve groups or communities heavily in data
collection
 Examples:

◦ community meetings
◦ mapping
◦ transect walks

50
 One of the most common participatory
methods
 Must be well organized

◦ agree on purpose
◦ establish ground rules
 who will speak
 time allotted for speakers
 format for questions and answers

51
 Drawing or using existing maps
 Useful tool to involve stakeholders

◦ increases understanding of the community


◦ generates discussions, verifies secondary sources
of information, perceived changes
 Types of mapping:
◦ natural resources, social, health, individual or
civic assets, wealth, land use, demographics

52
 Evaluator walks around community observing
people, surroundings, and resources
 Need good observation skills
 Walk a transect line through a map of a

community — line should go through all


zones of the community

53
 Examples of sources:
◦ files/records
◦ computer data bases
◦ industry or government reports
◦ other reports or prior evaluations
◦ census data and household survey data
◦ electronic mailing lists and discussion groups
◦ documents (budgets, organizational charts,
policies and procedures, maps, monitoring
reports)
◦ newspapers and television reports

54
Key issues: validity, reliability, accuracy,
response rates, data dictionaries, and
missing data rates

55
Advantages Often less expensive and faster
than collecting the original data
again

Challenges There may be coding errors or


other problems. Data may not be
exactly what is needed. You may
have difficulty getting access. You
have to verify validity and
reliability of data
56
 See what is happening
◦ traffic patterns
◦ land use patterns
◦ layout of city and rural areas
◦ quality of housing
◦ condition of roads
◦ conditions of buildings
◦ who goes to a health clinic

57
 need direct information
 trying to understand ongoing behavior
 there is physical evidence, products, or

outputs than can be observed


 need to provide alternative when other data

collection is infeasible or inappropriate

58
 a. Structured: determine, before the
observation, precisely what will be observed
before the observation

 b. Unstructured: select the method


depending upon the situation with no pre-
conceived ideas or a plan on what to observe

 c. Semi-structured: a general idea of what to


observe but no specific plan

59
 Maps and satellite images for complex or
pinpointed regional searches
 Has an Advanced version and an Earth

Outreach version
 Web site for Google Earth

◦ http://earth.google.com/

60
 Observation guide
◦ printed form with space to record
 Recording sheet or checklist
◦ Yes/no options; tallies, rating scales
 Field notes
◦ least structured, recorded in narrative, descriptive
style

61
 Have more than one observer, if
feasible
 Train observers so they observe the
same things
 Pilot test the observation data collection
instrument
 For less structured approach, have a
few key questions in mind

62
Advantages Collects data on actual vs. self-
reported behavior or perceptions. It is
real-time vs. retrospective

Challenges Observer bias, potentially unreliable;


interpretation and coding challenges;
sampling can be a problem; can be
labor intensive; low response rates

63
 Excellent for asking people about:
◦ perceptions, opinions, ideas
 Less accurate for measuring behavior
 Sample should be representative of the whole
 Big problem with response rates

64
 Structured:
◦ Precisely worded with a range of pre-
determined responses that the respondent can
select
◦ Everyone asked exactly the same questions in
exactly the same way, given exactly the same
choices
 Semi-structured
◦ Asks same general set of questions but
answers to the questions are predominantly
open-ended

65
Structured harder to develop
easier to complete
easier to analyze
more efficient when working with large numbers
Semi- easier to develop: open ended questions
structured more difficult to complete: burdensome for
people to complete as a self-administrated
questionnaire
harder to analyze but provide a richer source of
data, interpretation of open-ended responses
subject to bias
66
 Telephone surveys
 Self-administered questionnaires distributed

by mail, e-mail, or websites


 Administered questionnaires, common in the

development context
 In development context, often issues of

language and translation

67
 Literacy issues
 Consider accessibility

◦ reliability of postal service


◦ turn-around time
 Consider bias
◦ What population segment has telephone access?
Internet access?

68
Advantages Best when you want to know what
people think, believe, or perceive,
only they can tell you that
Challenges People may not accurately recall their
behavior or may be reluctant to reveal
their behavior if it is illegal or
stigmatized. What people think they
do or say they do is not always the
same as what they actually do.

69
 Often semi-structured
 Used to explore complex issues in depth
 Forgiving of mistakes: unclear questions
can be clarified during the interview and
changed for subsequent interviews
 Can provide evaluators with an intuitive
sense of the situation

70
 Can be expensive, labor intensive, and time
consuming
 Selective hearing on the part of the
interviewer may miss information that does
not conform to pre-existing beliefs
 Cultural sensitivity: e.g., gender issues

71
 Type of qualitative research where small
homogenous groups of people are brought
together to informally discuss specific topics
under the guidance of a moderator
 Purpose: to identify issues and themes, not

just interesting information, and not “counts”

72
 language barriers are insurmountable
 evaluator has little control over the situation
 trust cannot be established
 free expression cannot be ensured
 confidentiality cannot be assured

73
Phase Action

1 Opening Ice-breaker; explain purpose; ground rules;


introductions
2 Warm- Relate experience; stimulate group interaction;
up start with least threatening and simplest questions
3 Main Move to more threatening or sensitive and
body complex questions; elicit deep responses; connect
emergent data to complex, broad participation
4 Closure End with closure-type questions; summarize and
refine; present theories, etc; invite final comments
or insights; thank participants
74
Advantages Can be conducted relatively quickly and
easily; may take less staff time than in-depth,
in-person interviews; allow flexibility to make
changes in process and questions; can
explore different perspectives; can be fun

Challenges Analysis is time consuming; participants not


be representative of population, possibly
biasing the data; group may be influenced by
moderator or dominant group members

75
 Use when you want to capture information
about events in people’s daily lives
 Participants capture experiences in real-time

not later in a questionnaire


 Used to supplement other data collection

76
Step Process

1 Recruit people face-to-face


• encourage participation, appeal to altruism, assure
confidentiality, provide incentive
2 Provide a booklet to each participant
• cover page with clear instructions, definitions, example
• short memory-joggers, explain terms, comments on last
page , calendar
3 Consider the time-period for collecting data
• if too long, may become burdensome or tedious
• if too short may miss the behavior or event

77
 Cross between a questionnaire and a diary
 The evaluator specifies a list of behaviors or
events and asks the respondents to complete
the checklist
 Done over a period of time to capture the
event or behavior
 More quantitative approach than diary

78
Advantages Can capture in-depth, detailed data that might be
otherwise forgotten
Can collect data on how people use their time
Can collect sensitive information
Supplements interviews provide richer data

Challenges Requires some literacy


May change behavior
Require commitment and self-discipline
Data may be incomplete or inaccurate
Poor handwriting, difficult to understand phrases

79
Can be structured
Use of experts, or unstructured
one-on-one or as a Issues in selecting
panel experts
E.g., Government
task forces,
Advisory Groups

80
 Establish criteria for selecting experts not
only on recognition as expert but also based
on:
 areas of expertise
 diverse perspectives
 diverse political views
 diverse technical expertise

81
Advantages Fast, relatively inexpensive

Challenges Weak for impact evaluation


May be based mostly on perceptions
Value of data depends on how
credible the experts are perceived to
be

82
 Enables experts to engage remotely in a
dialogue and reach consensus, often
about priorities
 Experts asked specific questions; often
rank choices
 Responses go to a central source, are
summarized and fed back to the experts
without attribution
 Experts can agree or argue with others’
comments
 Process may be iterative

83
Advantages Allows participants to remain anonymous
Is inexpensive
Is free of social pressure, personality influence,
and individual dominance
Is conducive to independent thinking
Allows sharing of information
Challenges May not be representative
Has tendency to eliminate extreme positions
Requires skill in written communication
Requires time and participant commitment

84
- scales (weight) - health testing
- tape measure tools:
i.e. blood pressure
- stop watches
- aptitude and
- chemical tests :
achievement
i.e. quality of water
tests
-citizen report
cards

85
Choose more than one data collection
technique
No “best” tool
Do not let the tool drive your work but rather
choose the right tool to address the
evaluation question

86
[email protected]
 Probability samples
 Simple random sampling
 Stratified sampling
 Systematic sampling
 Cluster (area) sampling
 Multistage sampling
 Non probability samples
 Accidental, haphazard, convenience
 Modal instance
 Purposive
 Expert
 Quota
 Snowball
 Heterogeneity sampling
 Who = Population:
◦ all individuals of interest
◦ Road users, crashes occurred , all drives, population
 What = Parameter
◦ Characteristic of population
 Problem: can’t study/survey whole population
 Solution: Use a sample for the “who”
◦ subset, selected from population
◦ calculate a statistic for the “what”
 Probability
Samples: each member
of the population has a known
non-zero probability of being
selected
◦ Methods include random sampling,
systematic sampling, and stratified
sampling.
 Nonprobability Samples: members are
selected from the population in some
nonrandom manner
◦ Methods include convenience sampling, judgment
sampling, quota sampling, and snowball sampling
 Simple random sampling
 Stratified sampling
 Systematic sampling
 Cluster (area) sampling
 Multistage sampling
 N = the number of cases in the sampling
frame
 n = the number of cases in the sample

 C = the number of combinations (subsets)


N n
of n from N
 f = n/N = the sampling fraction
• Objective: Select n units out of N such
that every NCn has an equal chance.
• Procedure: Use table of random
numbers, computer random number
generator or mechanical device.
• Can sample with or without
replacement.
• f=n/N is the sampling fraction.
 Example:

◦ Small service agency.


◦ Client assessment of quality of service.
◦ Get list of clients over past year.
◦ Draw a simple random sample of n/N.
• Sometimes called "proportional" or
"quota" random sampling.
• Objective: Population of N units divided
into nonoverlapping strata N1, N2, N3, ...
Ni such that N1 + N2 + ... + Ni = N; then
do simple random sample of n/N in
each strata.
• To insure representation of each strata,
oversample smaller population groups.
• Administrative convenience -- field
offices.
• Sampling problems may differ in each
strata.
• Increase precision (lower variance) if
strata are homogeneous within (like
blocking).
 Proportionate: If sampling fraction is equal
for each stratum
 Disproportionate: Unequal sampling fraction

in each stratum
 Needed to enable better representation of

smaller (minority groups)


 Number units in population from 1 to N.
 Decide on the n that you want or need.
 N/n=k the interval size.
 Randomly select a number from 1 to k.
 Take every kth unit.
 Assumes that the population is randomly
ordered.
 Advantages: Easy; may be more precise than

simple random sample.


Procedure
 Divide population into clusters.
 Randomly sample clusters.
 Measure all units within sampled clusters.
 Advantages: Administratively useful,
especially when you have a wide geographic
area to cover.

 Examples: Randomly sample from city blocks


and measure all homes in selected blocks.
 Cluster (area) random sampling can be multi-
stage.
 Any combinations of single-stage methods.
Example: Choosing students from schools
 Select all schools; then sample within
schools.
 Sample schools; then measure all students.
 Sample schools; then sample students.
 NonprobabilitySamples: members are
selected from the population in some
nonrandom manner
◦ Methods include convenience sampling,
judgment sampling, quota sampling, and
snowball sampling
• Likely to misrepresent the population
• May be difficult or impossible to detect this
misrepresentation
 Accidental, haphazard, convenience
 Modal instance
 Purposive
 Expert
 Quota
 Snowball
 Heterogeneity sampling
 “Man on the street”
 College psychology majors
 Available or accessible clients
 Volunteer samples
 Problem: No evidence for representativeness
• Sample for the typical case
• Will it play in Peoria?
• Typical voter?
• Problem: May not represent the modal
group proportionately
 Might sample several pre-defined groups
(e.g., the shopping mall survey that attempts
to identify relevant market segments)
 Deliberately sampling an extreme group
 Problem: Proportionality
 Problem: Need theory to correctly sample an
extreme group
 Have a panel of experts make a judgment
about the representativeness of your sample.
 Advantage: At least you can say that expert

judgment supports the sampling.


 Problem: The “experts” may be wrong.
 Select people nonrandomly according to
some quotas
 Proportional quota sampling
 Nonproportional quota sampling
• Objective: Represent major characteristics of
population by sampling a proportional amount
of each. For example, if you know the
population has 40% women and 60% men, you
want your sample to meet that quota.
• Problem: How do you pick the characteristics?
How do you know their proportion in population?
• Making sure you have enough units from
each target group of interest (even if not
proportional).
• As with stratified random sampling, you
might do this to assure that you have good
representation of smaller population groups.
 One person recommends another, who
recommends another, who recommends
another, etc.
 Good way to identify hard-to-reach

populations, for example, homeless persons


 Snowball sampling is a non-probability

sampling technique that is used by


researchers to identify potential subjects in
studies where subjects are hard to locate.
• Make sure you include all sectors -- at least
several of everything -- don't worry about
proportions (like in quota sampling).
• Use when one or more people are a good
proxy for the group, for instance, when
brainstorming issues across stakeholder
groups.
Getu segni tulu
 Introduction
 Sampling errors
 Computing the sample size
 Strategies for determining sample size
 In survey studies, once data are
collected, the most important objective
of a statistical analysis is to draw
inferences about the population using
sample information.
 How big a sample is required?
 If the sample size is not taken properly,
conclusions drawn from the investigation
may not reflect the real situation for the
whole population
 What is the difference between parameters
and estimates of population
 There are two types of estimations : point

estimation and interval estimation


 If the inference about the population is to be

drawn on the basis of the sample, the sample


must conform to certain criteria: the sample
must be representative of the whole
population
 when we draw inference about parameter from
statistic, some kind of error arises.
 The error which arises due to only a sample
being used to estimate the population
parameters is termed as sampling error or
sampling fluctuations.
 Whatever may be the degree of cautiousness in
selecting sample, there will always be a
difference between the parameter and its
corresponding estimate.
 A sample with the smallest sampling error will
always be considered a good representative of
the population.
 Bigger samples have lesser sampling errors.
 When the sample survey becomes the census
survey, the sampling error becomes zero.
 On the other hand, smaller samples may be
easier to manage and have less non-sampling
error.
 Handling of bigger samples is more
expensive than smaller ones.
 The non-sampling error increases with the
increase in sample size
 various approaches for computing the sample
size
 the basic factors to be considered are
 the level of precision,
 the confidence level desired and
 degree of variability
1. Level of Precision
 The ‘degree of precision’ is the margin of
permissible error between the estimated value and
the population value.
 In other words, it is the measure of how close an
estimate is to the actual characteristic in the
population.
 The level of precision may be termed as sampling
error
 The difference between the sample statistic and the
related population parameter is called the sampling
error.
 depends on the amount of risk a researcher is willing
to accept while using the data to make decisions.
 is often expressed in percentage
 the sampling error or margin of error
 High level of precision requires larger sample sizes
and higher cost to achieve those samples.
2. the confidence level
The confidence or risk level is ascertained
through the well established probability model
called the normal distribution and an associated
theorem called the Central Limit theorem
The confidence level tells how confident one can
be that the error toleration does not exceed what
was planned for in the precision specification.
Usually 95% and 99% of probability are taken as
the two known degrees of confidence for
specifying the interval within which one may
ascertain the existence of population parameter
(e.g. mean).
3. Degree of variability
◦ The degree of variability in the attributes being
measured refers to the distribution of attributes in
the population.
◦ The more heterogeneous a population, the larger
the larger the sample size required to be , to obtain
a given level of precision
◦ Note that a proportion of 50% indicates a greater
level of variability than that of 20% or 80%.
 Before you can calculate a sample size: need
to determine about the target population and
the sample you need
1. Population Size
2. Margin of Error (Confidence Interval)
3. Confidence Level
4. Standard of Deviation
◦ To determine a representative sample size
from the target population, different
strategies can be used according to the
necessity of the research work.
◦ Use of various formulae for determination
of required sample sizes under different
situations is one of the most important
strategies.
a) Formula for proportions:
i) Cochran’s formula for calculating sample size
when the population is infinite:

Cochran (1977) developed a formula to calculate


a representative sample for proportions as
n0 = z^2qp/e^2
where, n0 is the sample size, z is the selected
critical value of desired confidence level, p is the
estimated proportion of an attribute that is
present in the population, q =1-p and e is the
desired level of precision
*2
z pq
n= 2
; ME  .02; p is estimated to
(ME)
be about .6 from previous years' data;
*
90%  z  1.645
(1.645)2 (.6)(.4)
n 2
 1,623.6;  n  1624
(.02)
 Examples
i) p = 0.5 and hence q =1-0.5 = 0.5; e = 0.05; z
=1.96
n= 384.16=384
ii) Again, taking 99% confidence level with ±5%
precision, the calculation for required sample
size will be as follows--
p = 0.5 and hence q =1-0.5 = 0.5; e = 0.05; z
=2.58
n= 665.64 = 666
Confide
Sample
nce
size (n0)
level
e= 0.03 e= 0.05 e= 0.1
95% 1067 384 96

99% 1849 666 166


ii. Cochran’s formula for calculating sample
size when population size is finite:
 Cochran pointed out that if the population is

finite, then the sample size can be reduced


slightly.
n= n0 / (1+(n0-1)/N)
Here, n0 is the sample size derived from
previous equation and N is the population
size
iii) Yamane’s formula for calculating sample size :
Yamane (1967) suggested another simplified
formula for calculation of sample size from a
population which is an alternative to Cochran’s
formula.
n= N/ (1+N (e^2))
where, N is the population size and e is the level of
precision
b. Comparative study of two different methods
of allocation:
The stratification was done following the
principles that –
i) The strata (i.e. categories of schools) are
non-overlapping and together comprise the
whole population.
ii) The strata (i.e. categories of schools) are
homogeneous within themselves with respect
to the characteristics under study
i. Sample size through proportional allocation method :
ni =n (Ni/N) i= 1,2,3,4……..
Where n represents sample size, Ni represents population
size of the ith strata and N represents the population
size.
ii. The sample size through optimum allocation method :
The allocation of the sample units to the different
stratum is determined with a view to minimize the
variance for a specified cost of conducting the survey
or to minimize the cost for a specified value of the
variance. The cost function is given by

Where, a is the observed cost which is constant, ci is the


average cost of surveying one unit in the ith stratum.
 therefore, the required sample size in different
stratum is given by

 Where, n = sample size for the study, Ni =


population size for the study, Si =variance of the
ith stratum
 Issues in Estimating Sample Size for Confidence Intervals
Estimates
 Sample Size for One Sample, Continuous Outcome
 Sample Size for One Sample, Dichotomous Outcome
 Sample Sizes for Two Independent Samples, Continuous
Outcome
 Sample Size for Matched Samples, Continuous Outcome
 Sample Sizes for Two Independent Samples, Dichotomous
Outcome
 Issues in Estimating Sample Size for Hypothesis Testing
 Sample Size for One Sample, Dichotomous Outcome
 Sample Sizes for Two Independent Samples, Continuous
Outcome
 Sample Size for Matched Samples, Continuous Outcome
 Sample Sizes for Two Independent Samples, Dichotomous
Outcomes
Read the material in the following link.
http://sphweb.bumc.bu.edu/otlt/mph-module
s/bs/bs704_power/BS704_Power_print.html

http://sphweb.bumc.bu.edu/otlt/mph-mod
ules/bs/bs704_power/BS704_Power_print.ht
ml

You might also like