0% found this document useful (0 votes)
95 views

Research Methodology and Biostatistics Unit II Part I

Uploaded by

Sohail Sheikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

Research Methodology and Biostatistics Unit II Part I

Uploaded by

Sohail Sheikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Research

Methodology
and Biostatistics
Unit II – Biostatistics: Definition, application, sample size,
importance of sample size, factors influencing sample size,
dropouts, statistical tests of significance, type of
significance tests, parametric tests(students “t” test,
ANOVA, Correlation coefficient, regression), non-
parametric tests (Wilcoxan rank tests, analysis of variance,
correlation, chi square test), null hypothesis, P values,
degree of freedom, interpretation of P values.
• What is statistics?
A collection of techniques for extracting
information from data, and for ensuring
that the data collected contains the desired
Statistics information.

• What does statistics do for us?


Statistics provides an objective basis for
making decisions in the presence of
uncertainty.
• Daniel (1978) "...
 statistics is a field of study concerned with the
organization and summarization of data, and the
drawing of inferences about a body of data when
only a part of the data are observed.“
Statistics  It is the science which deals with collection,
classification and tabulation of numerical facts as the
basis for explanation, description and comparison of
phenomenon.
- Lovitt
 “the science of collection, organization, presentation,
analysis and interpretation of numerical data”
• Presents facts in a definite form
Functions of • Simplifies mass of figures. condensing
overall information
statistics • Facilitates comparison
• Helps in formatting and testing hypothesis
• Helps in prediction
• Formulation & suitable policies
• Biostatistics is the science of collection, analysis and
interpretation of facts and numbers connected with biology
• Biometrics – Biological measurements
What is  Eg. Estimation of Oxygen in a few samples
• It is the branch of statistics concerned with mathematical facts
Biostatistics and data related to biological events.
• It is the science that helps in managing medical uncertainties.
• Biostatistics covers applications and contributions not only
from health, medicines and, nutrition but also from fields such
as genetics, biology, epidemiology, and many others.
• It is mainly consists of various steps like generation of
hypothesis, collection of data, and application of statistical
analysis
• Descriptive statistics, as the name implies, describe data that

Types of Statistics we collect or observe (empirical data).


• They represent all of the procedures that can be used to
organize, summarize, display, and categorize data collected for
a certain experiment or event.
Examples include: the frequencies and associated
percentages; the average or range of outcomes; and pie charts,
bar graphs or other visual representations for data.
These types of statistics communicate information, they
provide organization and summary for data or afford a visual
display.
Such statistics must be:
1) an accurate representation of the observed outcomes;
2) presented as clear and understandable as possible; and
3) be as efficient and effective as possible.
Inferential statistics
Inferential statistics represent a wide range of procedures that are traditionally thought of as statistical tests (i.e., t-test,
analysis of variance or chi square test).
• Inferential statistics actually involves a series of steps:
1) Establishing a research question
2) Formulating a hypothesis that will be tested
3) Selecting the most appropriate test based on the type of data collected
4) Selecting the data correctly
5) Collecting the required data or observations
6) Performing the statistical test; and
7) Making a decision based on the result of the test. This last step, the decision making, will result in either the rejection
of or failure to reject the hypothesis and will ultimately answer the research question posed in the first step of the
process.
Applications
of
Biostatistics
As a Science
• In Physiology And Anatomy
 To define what is normal or healthy in a population.
 To find the limits of normality in variables such as
weight and pulse rate etc., in a population.
 To find the difference between means and proportions
of normal at two places or in different periods.
 To find the correlation between two variables X and Y
such as height and weight.
Pharmacology:
• To find the action of drug
To compare the action of two different drugs or two successive dosages of
the same drug.
To find the relative potency of a new drug with respect to a standard drug.
Medicine:
To compare the efficacy of a particular drug, operation or line of treatment
To find an association between two attributes such as cancer and smoking or
filariasis and social class.
To identify signs and symptoms of a disease or syndrome.
To test usefulness of sera and vaccines in the field
In epidemiological studies
the role of causative factors is statistically tested.
Modern medicine
For decades, Biostatistics has played an integral role in modern medicine in
everything from analyzing data to determining if a treatment will work to
developing clinical trials
CLINICAL MEDICINE
 Documentation of medical history of diseases.
 Planning and conduct of clinical studies.
 Evaluating the merits of different procedures.
 In providing methods for definition of ‘normal’ and
‘abnormal’.
• PREVENTIVE MEDICINE
 To provide the magnitude of any health problem in the
community.
 To find out the basic factors underlying the ill health.
 To evaluate the health programs which was introduced
in the community(success/failure).
 To introduce and promote health legislation.
• Health and vital statistics are essential
tools in demography, public health,
medical practice and community services.
• Recording of vital events in birth and death
registers and diseases in hospitals is like
book keeping of the community,
Biostatistics As describing the incidence or prevalence of
diseases, defects or deaths in a defined
Figures population.
What are the leading causes of death?
What are the important cause of sickness?
Whether a particular disease is rising or
falling in severity and prevalence? etc.
Population

Data
Basic Concepts of
Biostatistics
Sample

Variable
Population
Population Samples
Basic Definitions and Concepts

• The water samples form a population. The


estimation of oxygen(O2) in each water sample
Water samples Amount of Oxygen is the collection of data. The amount of oxygen
(ml) in water samples is the data.
• Arranging values in column is called tabulation.
1 4.5 • In one water sample the amount of oxygen will
2 6.9 be higher and in another it is lower. It is
interpretation.
3 6.2
4 5.3
• A Simple table to explain the science of
Biostatistics
Population and
sample
• Population Sample

• Tablet batch 20 tablets taken for content Uniformity

• Serum Cholesterol Blood samples drawn once a week


levels of one patient for 3 months from a
single patient.
2 types of population

The population containing limited number of individuals is called finite population.


e.g. Number of students in a class, Number of TB patients in a hospital.

The population, containing unlimited number of individuals, is called an infinite


population
e.g. stars in the sky, Microbial population in the soil .
• Parameter : Any measurable
characteristic of the universe is called a
parameter
Eg. The average weight of a batch of
tablets /average blood pressure of
hypertensive persons – parameters of–
Parameter / respective population.
• Statistics are numerical descriptive
Statistic measures corresponding to samples
Eg., study conducted to observe the
effect of grapefruit juice on
cyclosporine and prednisone
metabolism in transplant patients
The values recorded in an experiment or observation Statistics are
a set of numerical data
Raw material of statistics always originate from operation of
counting (Enumeration)/measurements.
For statistical enquiry whether social science/business/economics
DATA -basic problem – to collect facts and figures relating to
particular phenomena under study
Two types of data Primary and Secondary.
Primary Data: The data Collected by an investigator i.e. first
hand information.
Secondary Data: The data collected from other source is called
Secondary data.
Eg. Data Collected from journals.
Basic Definitions and Concepts
Variable
The value of an item or individual is called variable. As the values vary, it is called variable. It is characteristic
of an individual.
Variables are the measurements, the values which are the characteristic of the data collected in
experiments. Eg. Tablet weight, Blood pressure.

Variables

Quantitative Qualitative
(Blood Pressure) (colour of skin)

Discontinuous Continuous
(Discrete) Percentage of haemoglobin
No. of TB patients in
a hospital
• Qualitative (categorical) variables e.g.
Colour of skin
• Quantitative variables: e.g. blood sugar level
• Quantitative Continuous: decimals are
Types of allowed: e.g. blood pressure, height, weight
Variables • Quantitative Discrete: integers only
e.g. number of anesthetic shots, number of
hospital admissions, blood cell count
Qualitative observation

Classification of Leprosy according to the Type of leprosy

Type of leprosy No. Of patients

Tuberculoid 151

Indeterminate 18

Borderline 12

Total 181

Classification of Leprosy according to the Type of leprosy and gender

Type of leprosy No. Of patients


Male female Total
Tuberculoid 77 74 151
Indeterminate 33 36 49
Borderline 10 5 10

Total 129 105 210


Quantitative observation
Distribution of 13 Normal Children according to the Haemoglobin Content of their blood

Grams of haemoglobin No. Of patients


per 100cc of Blood
9.0-9.4 1

9.5-9.9 4

10.0-10.4 8

Total 13

Differential Counts from the blood of a person Classified accoording to the number of Esinophils

No. of Esinophils encountered in 100 No. Of Smears


W.B.C
0 11

1 20

2 20

Total 51
• Independent variables
• Precede dependent variables in time
• Are often manipulated by the researcher
• The treatment or intervention that is used in a
study
• Dependent variables
Variables • What is measured as an outcome in a study
• Values depend on the independent variable
• Example:
For instance, if we wish to compare bioavailabilities
of various dosage forms, the dependent variable
would be AUC (area under the concentration–time
curve), and the independent variable would be
dosage form.
• The number (n) of observations taken from a
population through which statistical inferences for
the whole population are made.
• Sample:
“A small portion of the population which truly
represents the population with respect to the study
characteristics .”
• Need for sample size:
Sample Size Biological data is highly variable Crucial element in
the planning of any research project economy in
terms of personnel, equipment's, time and related
aspects but ,not at the cost of a desired precision,
confidence and power.
• Why it is important?
Integral part of quantitative research.
Ensuring validity, accuracy, reliability, scientific and
ethical integrity of research.
• Three main concepts to be considered:
Estimation (depends on several components).
Considerations Justification (in the light of budgetary or
in sample size biological considerations)
Adjustments (accounting for potential dropouts
calculation or effect of covariates)
Importance of Sample Size calculation
Scientific reasons
Ethical reasons
Economic reasons
• I-Scientific Reasons
 In a trial with negative results and a sufficient sample size, the
result is concrete (treatment has no effect-no difference).
 In a trial with negative results and insufficient power
(insufficient sample size), may mistakenly conclude that the
Importance of treatment under study made no difference (false conclusion).
• II-Ethical Reasons

Sample Size  Undersized study can expose subjects to potentially harmful


treatments without the capability to advance knowledge.

calculation  Oversized study has the potential to expose an unnecessarily


large number of subjects to potentially harmful treatments.
• III – Economical Reasons:
 Undersized study is a waste of resources due to its inability to
yield a meaningful useful results.
 Oversized study potential of statistically significant result with
doubtful clinical significance leading to waste of resources
Sample size determination is the mathematical estimation
of the number of subjects/units to be included in a study.

When a representative sample is taken from a population,


the finding are generalized to the population.

WHAT IS SAMPLE Optimum sample size determination is required for the


following reasons:
SIZE
DETERMINATION To allow for appropriate analysis

To provide the desired level of accuracy

To allow validity of significance test.


HOW LARGE A SAMPLE
DO I NEED?
• If the sample is too small:
 Even a well conducted study may fail to answer its research
question
 It may fail to detect important effect or associations
 It may associate this effect or association imprecisely
• If the sample size is too large:
 The study will be difficult and costly
 Time constraint
 Available cases e.g rare disease.
 Loss of accuracy.
• Optimum sample size must be determined before
commencement of a study.
• Universe or population:
The sum of total or aggregate of all units/cases that comfort to some
Key Terms in designated set of specifications is called the universe or population.
• Sample: Part/ portion of the population/total population.
Sampling
• Sampling element:
Each entity from the population which information is
collected:
e.g. patient in a hospital, animals with specific treatment
– Drug…
• Sampling Unit:
Key Terms in Either a single member/collection of members in the
Sampling sample.
• Sample frame:
It is the complete list of all units/elements from which
the sample is drawn- the list of patients in all wards in
the hospital.
Also called as Working frame- it provides the list that can
be operationally worked with.
Key Terms in
Sampling
• Target Population:
The researcher would like to generalize his result.
Criteria are specified for determining which cases
are included and which ones are excluded in the
population under study .
Key Terms in E.g- Use of X-drug in a city/ hospital
Sampling • Sampling trait:
It is the element on the basis of which we take out
the sample from the total universe.
It could be quantitative(variable) or qualitative
(attribute)
E.g. X disease in a village -gender, age and residence
• Sampling fraction:
It is the proportion of the total population to be included
in the sample.
𝑆𝑖𝑧𝑒 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 𝑛
Formula= =
𝑡𝑜𝑡𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑁
E.g. Total population = 2200, size of the sample = 300 ,
Key Terms in applying the formula =
𝑛 300
𝑁 2200
i. e., about one −
seventh of the total population.
Sampling • Sample Estimate:
Estimate from a sample value of what the value would be
in the total population from which the sample is drawn.
E.g., In a college of 1200 students – a sample of 300 is
drawn- the average age – 18.6 years – in total population it
would be 18.6 years.
• Biased Sample:
• When the sample is so chosen that some elements are more
likely to be represented than other elements.
• E.g.,
For example, a survey of high school students to measure
teenage use of illegal drugs will be a biased sample because it
does not include home-schooled students or dropouts.
Key Terms in A sample is also biased if certain members are under -
represented or overrepresented relative to others in the
Sampling population.
Parameter:
Characteristics of a population – summary description of a
variable for a population – value calculated from a defined
population – Mean, standard deviation/ standard error.
e.g., - Average weight of TB patients
i.e., Parameter - Represents a summary description of the
population.
A statstic represents description of the sample
Key Terms in Sampling
• Sampling Error:
It is difference between total population value and the sampling value, or
it may be said that is the degrees to which the sample characteristics
approximate the “Characteristic of the total population”. Eg
Statistic
21 Years Sample 1

Error= 1 year
Parameter
20 years Statistic Sample 2
24 Years
Error= 4 year
Statistic Sample 3
26 Years
Error = 6 year
• Sampling error conti….
Sampling error is not a measure error, nor it
is a systematic bias in sample- it is the error
which depends on the representatives of the
sample.
Less the sampling error- greater the
Key Terms in precision of the sample.
Sampling Representative sample depends upon:
Sampling error – function of sample size
Non-sampling error – systematic error - -
study design, correction - execution of
sampling error and non-response error.
Sample Size Determination - quantitative data
• Mean pulse rate of a population is believed to be 70 per minute with a
standard deviation of 8 beats. Calculate the minimum size of the sample to
verify this, if allowable error (i) If E = ±1 beat at 5% risk and (ii) If E = ± 2 beats
with 5% risk.
• Solution:
4𝜎 2 4×8×8
(i) n= 2 = = 256.
𝐸 1×1
4𝜎 2 4×8×8
(ii) n= 2 = =64
𝐸 2×2
If E is less, n will be more, i.e. larger the sample size, lesser will
be the error.
To Solve
• Mean systolic blood pressure in one college students was found to be
120 with SD of 10. Calculate the minimum size of the sample to verify
the result if allowable error is 2 at 5% risk.
• Solution:
4𝜎 2 4×10×10
• n= 2 = = 100
𝐸 2×2
Sample Size Determination - qualitative data
• Incidence rate in the last influenza epidemic was found to be 50 per
thousand (5%) of the population exposed. What should be the size of
sample to find incidence rate in the current epidemic if allowable error is
0.005 and 0.01?
4𝑝𝑞
•n= 2 p = 0.05 q = 1 − p = 1 − 0.05 = 0.95
𝐸
If E=0.005
4𝑝𝑞 4(0.05)×(0.95)
n= 2 = 2 = 7600
𝐸 0.005
If E=0.01
4𝑝𝑞 4(0.05)×(0.95)
n= 2 = 2 = 1900
𝐸 0.01
So larger the permissible error, the smaller will be the size of sample
required for both types of data.
To Solve
• Hookworm prevalence rate was 30% before the specific treatment and adoption of
other measures. Calculate the size of the sample required to find the prevalence
rate now if allowable error is 0.03 and 0.06.
• Solution:
• If E = 0.03
4𝑝𝑞 4(0.3)×(0.7)
n= 2 = =933.3 ~ 934
𝐸 0.03 2
• If E = 0.06
4𝑝𝑞 4(0.3)×(0.7)
n= 2 = =233.3 ~ 234
𝐸 0.06 2
Thus, if we allow a small error, the required sample size will be much larger as
compared to one when the allowable error is increased.
Sampling techniques
I. Random Sampling / probability sampling
• Simple Random Sampling :
 Lottery Method.
 Table of random Numbers
• Systematic sampling
• Stratified sampling
• Multistage sampling
• Cluster sampling
• Multiphase sampling
Sampling techniques
SAMPLING TECHNIQUES - Simple Random Sampling:

‘unrestricted random sampling’


Lottery method
Table of random number method

369 495
428 572
565 169
969 786
385 094
Sampling Techniques - Systematic Sampling
• Systematic Sampling:
𝑇𝑜𝑡𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
K=
𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 𝑑𝑒𝑠𝑖𝑟𝑒𝑑
if 10% sample is to be taken out of one thousand patients.
1000
K= = 10
10% 𝑜𝑓 1000
• One random number is found by pulling out one card after shuffling,
out of 10 cards serially numbered 1 to 10. Supposing it is 6, then the
sample will consist of units with sample numbers 6, 6 + 10 = 16, 16 +
10 = 26, 26 + 10 = 36 and so on. Examine every 10th house after the
6th house.
Sampling Techniques - Systematic Sampling
• To assess incidence of influenza in one epidemic in a large city like
Bombay. If 20% sample is to be taken,out of 100 what has to be done
and how to solve?
• Solution:
100
K= = 5
20% 𝑜𝑓 100
examine every 5th case starting with the random number such as 2,
subsequent numbers will be 2 + 5, 7 + 5, 12 + 5, i.e. 7, 12. 17, 22 and so
on.
Sampling Techniques conti….
• Stratified Sampling
This method is followed when the population is not homogeneous.
The population under study is first divided into homogeneous groups or
classes called strata and the sample is drawn from each stratum at random
in proportion to its size.
• Multistage Sampling
employed in large country surveys.
In the first stage, random numbers of districts are chosen in all the states,
followed by random numbers of talukas, villages and units, respectively,
e.g. for hookworm survey in a district, choose 10% villages in the talukas
and then examine stools of all persons in every 10th house.
Sampling
Techniques
conti….
• Cluster Sampling:
• A cluster is a randomly selected group.
 E.g., As per module approved by WHO, it
is most often used to evaluate vaccination
coverage in expanded Programme of
Immunization (EPI) and Universal
Immunization Programme (UIP), where
only 210 children, taking 7 from each
cluster in the age group 12–23 months are
to be examined.
• Multiphase Sampling:
 In this method, part of the information is
collected from the whole sample and part
from the subsample.
I. II.Non-random sampling/ non probability sampling
 Convenience Sampling
 Purposive sampling/Judgment sampling
Sampling techniques  Quota sampling
 Snowball sampling
Purpose of sampling
• Complete coverage may not be possible
• High degree of accuracy
• Short period of time valid and comparable results can be obtained
• Less demanding – Requirements of investigation
• Economical
• Quality control
• Draw inference about the universe
• Generalization
• Random error: error that occur by chance.
• Sources
 sample variability,
 Subject to subject differences & measurement errors.
 It can be reduced
 averaging,
 increase sample size,
 repeating the experiment.
• Systematic error: deviations not due to chance alone.
 Several factors, e.g patient selection criteria may contribute.
 It can be reduce by good study design and conduct of the experiment.
• Precision: the degree to which a variable has the same value when
measured several times. It is a function of random error. Sample
represents the population
• Accuracy: the degree to which a variable actually represent the true
value. It is function of systematic error. – Bias is a sent from the sample
• Power(1-b): This is the probability that the test will correctly identify a
significant difference, effect or association in the sample should one
exist in the population. Sample size is directly proportional to the power
of the study. The larger the sample size, the study will have greater
power to detect significance difference, effect or association.
• Effect size: is a measure of the strength of the relationship between two
variables in a population. It is the magnitude of the effect under the
alternative hypothesis. The bigger the size of the effect in the
population, the easier it will be to find.
• Confidence Interval: A confidence interval, in statistics, refers to the
probability that a population parameter will fall between two set values
for a certain proportion of times. A confidence interval can take any
number of probabilities, with the most common being a 95% or 99%
confidence level.
• Null hypothesis: It states that there is no difference among groups or no
association between the predictor & the outcome variable. This
hypothesis need to be tested.
• Alternative hypothesis: It contradict the null hypothesis. If the
alternative hypothesis cannot be tested directly, it is accepted by
exclusion if the test of significance rejects the null hypothesis. There are
two types; one tail(one-sided) or two tailed(two-sided).
• Type I(α) error: It occurs if an investigator rejects a null hypothesis that
is actually true in the population. The probability of making (α) error is
called as level of significance & considered as 0.05(5%). Sample size is
inversely proportional to type I error.
• Type II(β) error: it occur if the investigator fails to reject a null
hypothesis that is actually false in the population
• Type I error or alpha (false - positive) :Rejecting the null when it is true.
• Type II error or beta (false - negative) : Accepting the null when it is false.
α
• The probability of committing a type I error (rejecting the null when it is actually
true) is called (alpha), another name is the level of statistical significance.
• An level of 0.05, setting 5 % as the maximum chance of incorrectly rejecting the
null hypothesis.
β
• The probability of making a type II error (failing to reject the null hypothesis when
it is actually false) is called (beta).
• The quantity (1-β) is called power, the ability to detect the difference of a given
size.
• If is set at 0.10, we are willing to accept a 10 % chance of missing an association of
a given effect size.
• This represents a power of 90 % (there is 90 % chance of finding an association of
that size
Nature of universe

Number of classes proposed

Nature of study
DETERMINATION
OF SIZE OF Type of sampling
SAMPLE
Standard of accuracy and acceptable confidence level

Availability of finance

Other considerations
to specify the precision of
estimation desired and then to
Approaches determine the sample size
for necessary to insure it
determining
the size of uses Bayesian statistics to weigh
the sample the cost of additional information
against the expected value of the
additional information.
DETERMINATION OF SAMPLE SIZE THROUGH THE APPROACH BASED ON PRECISION
RATE AND CONFIDENCE LEVEL

• (a) Sample size when estimating a mean:

Acceptable Error
Infinite Population
Finite Population
• In case of finite population the confidence interval for µ is given by
the formula:
• the confidence interval for µ precision is taken as equal to ‘e’

Determining ‘n’
where
N = size of population
n = size of sample
e = acceptable error (the precision)
σp = standard deviation of population
z = standard variate at a given confidence level.
• Determine the size of the sample for
estimating the true weight of the cereal
containers for the universe with N = 5000 on
the basis of the following information:
(1) the variance of weight = 4 ounces on the
Solve basis of past records.
(2) estimate should be within 0.8 ounces of
the true average weight with 99% probability.
Will there be a change in the size of the
sample if we assume infinite population in the
given case? If so, explain by how much?
Ans
• N = 5000;
• e = acceptable error (the precision)
• σp = 2 ounces (since the variance of weight = 4 ounces);
• e = 0.8 ounces (since the estimate should be within 0.8 ounces of the true
average weight);
• z = 2.57 (as per the table of area under normal curve for the given
confidence level of 99%).

Hence, the sample size (or n) = 41 for the given precision and confidence level in the above
question with finite population
As per the question if the population is
infinite then

Thus, in the given case the sample size remains the same even if we assume infinite population.
Home work
• A hospital administrator wishes to estimate the mean weight of babies
born in her hospital. How large a sample of birth records should be taken
if she wants a 99 percent confidence interval that is 1 pound wide?
Assume that a reasonable estimate of s is 1 pound. What sample size is
required if the confidence coefficient is lowered to .95?
• A physician would like to know the mean fasting blood glucose value
(milligrams per 100 ml) of patients seen in a diabetes clinic over the past
10 years. Determine the number of records the physician should examine
in order to obtain a 90 percent confidence interval for m if the desired
width of the interval is 6 units and a pilot sample yields a variance of 60.
(b) Sample size when estimating a percentage
or proportion:

 where p = sample proportion, q = 1 – p;


 z = the value of the standard variate at a given confidence level and to be worked out from table showing area under
Normal Curve;
 n = size of sample.

Question: what is e, N and n


Solve:
• What should be the size of the sample if a simple random sample from a
population of 4000 items is to be drawn to estimate the per cent defective
within 2 per cent of the true value with 95.5 per cent probability? What
would be the size of the sample if the population is assumed to be infinite in
the given case?
• N = 4000;
• e = .02 (since the estimate should be within 2% of true value);
• z = 2.005 (as per table of area under normal curve for the given confidence
level of 95.5%).
• As we have not been given the p value being the proportion of defectives in
the universe, let us assume it to be p = .02
Solution

If infinite population
• Suppose a certain hotel management is
interested in determining the percentage of
the hotel’s guests who stay for more than 3
days. The reservation manager wants to be
To Solve 95 per cent confident that the percentage
has been estimated to be within ± 3% of the
true value. What is the most conservative
sample size needed for this problem?
Solution:
• Population is infinite;
• e = .03 (since the estimate should be within 3% of the true value);
• z = 1.96 (as per table of area under normal curve for the given confidence level of 95%).
• As we want the most conservative sample size we shall take the value of p = .5 and q = .5.
• Using all this information, we can determine the sample size for the given problem as under:

• Thus, the most conservative sample size needed for the problem is = 1067.
• A survey is being planned to determine what
proportion of families in a certain area are
medically indigent. It is believed that the
To solve 2 proportion cannot be greater than .35. A 95
percent confidence interval is desired with
e=.05. What size sample of families should
be selected?
Homework
• An epidemiologist wishes to know what proportion of adults living in a
large metropolitan area have subtype hepatitis B virus. Determine the
sample size that would be required to estimate the true proportion to
within .03 with 95 percent confidence. In a similar metropolitan area the
proportion of adults with the characteristic is reported to be .20. If data
from another metropolitan area were not available and a pilot sample
could not be drawn, what sample size would be required?

You might also like