Biostat Lecture seven
Biostat Lecture seven
02/03/25 [email protected] 1
Introduction
In the real world, the values of population parameters are fixed
and usually not known.
Instead, we must try to say something about the way in which
a variable is distributed using the information contained in a
sample of observations.
The process of drawing conclusions about an entire population
based on the data in a sample is known as statistical inference.
Two broad categories: Estimation and Hypothesis testing.
02/03/25 [email protected] 2
Estimation
Is concerned with estimating the values of specific population
parameters based on sample statistic.
Is about using information in a sample to make estimates of the
characteristics (parameters) of the source population.
02/03/25 [email protected] 5
Point versus Interval Estimators
♣ An estimator that represents a "single best guess" is called a
point estimator.
Thus,
A point estimate is of the form: [ Value ],
02/03/25 [email protected] 7
Estimating the Sampling Error
02/03/25 [email protected] 9
The sampling error is dependent on sample size (n) and the
02/03/25 [email protected] 10
Example
Suppose that the mean ± sd of DBP on 20 males is 78.5 ± 10.3 mm
Hg.
02/03/25 [email protected] 11
The following table gives the se for mean of DBP for different
sample sizes.
Our best estimate of µ is 78.5.
02/03/25 [email protected] 12
1. Point Estimate
A single numerical value used to estimate the corresponding
population parameter.
Sample Statistic are Estimators of Population Parameters
Sample mean, µ
Sample variance, S2 2
Sample P or π
proportion, p OR
Sample Odds Ratio,
RR
OŔ
02/03/25 [email protected] ρ 13
Sample Relative Risk, RŔ
2. Interval Estimation
Interval estimation specifies a range of reasonable values for
the population parameter based on a point estimate.
A confidence interval is a particular type of interval estimator.
Confidence Intervals
Give a plausible range of values of the estimate likely to include
the “true” (population) value with a given confidence level.
An interval estimate provides more information about a
population characteristic than does a point estimate
02/03/25 [email protected] 14
CIs also give information about the precision of an estimate.
02/03/25 [email protected] 16
A confidence interval has 3 components:
population parameter
A percentage (less than 100%)
Example: 95%
Also written (1 - α) = .95
02/03/25 [email protected] 18
Definition of 95% CI
1. Probabilistic interpretation:
If all possible random samples of a given sample size were obtained
and if each were used to obtain its own CI, then 95% of all such CIs
would contain the unknown population parameter; the remaining 5%
would not.
2. Practical interpretation
When sampling is from a normally distributed population with known
standard deviation, we are 100 (1-α) [e.g., 95%] confident that the
single computed interval contains the unknown population
parameter.
02/03/25 [email protected] 19
Estimation for Single Population
02/03/25 [email protected] 20
1. CI for a Population Mean (normally distributed)
02/03/25 [email protected] 21
Assumptions
Population standard deviation () is known
02/03/25 [email protected] 23
Finding the Critical Value
02/03/25 [email protected] 24
Margin of Error
(Precision of the estimate)
02/03/25 [email protected] 25
Factors Affecting Margin of Error
c. 02/03/25
What effect does larger sample size have on the CI?
[email protected] 27
a.
2.25
1.52 1.96 1.52 1.96(.33)
20
1.52 .65 (.87, 2.17)
02/03/25 [email protected] 28
b. 2.25
1.52 1.96 1.52 1.96(.27)
32
1.52 .53 (.99, 2.05)
02/03/25 [email protected] 29
When constructing CIs, it has been assumed that the standard
deviation of the underlying population, , is known
What if is not known?
02/03/25 [email protected] 30
Example: It was found that a sample of 35 patients were 17.2
minutes late for appointments, on the average, with SD of 8
minutes. What is the 90% CI for µ? Ans: (14.98, 19.42).
Since the sample size is fairly large (>30) and the
population SD is unknown, we assume the distribution
of sample mean to be normally distributed based on
the CLT and the sample SD to replace population .
02/03/25 [email protected] 31
B. Unknown variance
(small sample size, n ≤ 30)
What if the for the underlying population is unknown and
the sample size is small?
02/03/25 [email protected] 32
02/03/25 [email protected] 33
Student’s t Distribution
The t is a family of continuous probability distributions
Bell Shaped
02/03/25 [email protected] 35
Student’s t Table
02/03/25 [email protected] 36
t distribution values
With comparison to the Z value
02/03/25 [email protected] 37
Example
Standard error =
02/03/25 [email protected] 38
02/03/25 [email protected] 39
2. CIs for population proportion, p
Confidence coefficient
02/03/25 [email protected] 40
02/03/25 [email protected] 41
02/03/25 [email protected] 42
Lower limit = Point Estimate - (Critical Value) x (Standard
Error of Estimate)
Upper limit = Point Estimate + (Critical Value) x (Standard
Error of Estimate)
Hence,
02/03/25 [email protected] 43
Example 1
A random sample of 100 people shows that 25 are left-
handed. Form a 95% CI for the true proportion of left-
handers.
02/03/25 [email protected] 44
Interpretation
02/03/25 [email protected] 45
Example
It was found that 28.1% of 153 cervical-cancer cases had never
had a Pap smear prior to the time of case’s diagnosis. Calculate
a 95% CI for the percentage of cervical-cancer cases who never
had a Pap test.
02/03/25 [email protected] 46
Sample size Determination
Too small sample size :
May fail to detect an important effect
02/03/25 [email protected] 48
Steps to determine sample size:
1. Specify tolerable error (i.e., desired precision and confidence
level via d and )
4. Solve for n
02/03/25 [email protected] 50
Examples
1. A survey is being planned to determine what proportion of
families in a certain area are medically indigent. It is found
that the proportion is 0.35 from previous studies. A 95%
confidence interval is desired with d=5% What size sample of
families should be selected?
02/03/25 [email protected] 51
2. Suppose that you are interested to know the proportion of
infants who breastfed >18 months of age in a rural area.
Suppose that in a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample size is required to
estimate the true proportion within ±3% points with 95%
confidence. Let p=0.20, d=0.03, α=5%
02/03/25 [email protected] 52
Example
3. Suppose that for a certain group of cancer patients, we are
interested in estimating the mean age at diagnosis. We would like
a 95% CI and wants margin of error of 2 units.
= 1.96*1.96*124 = 119
2*2
02/03/25 [email protected] 53
Suppose there is no prior information about the proportion
(p) who breastfeed
02/03/25 [email protected] 54
An estimate of p is not always available.
02/03/25 [email protected] 56
Using design effect
The loss of effectiveness by the use of cluster sampling,
instead of simple random sampling, is the design effect.
The design effect is basically the ratio of the actual variance,
under the sampling method actually used, to the variance
computed under the assumption of simple random
sampling
02/03/25 [email protected] 57
Using design effect cont.…
When simple and systematic random sampling
techniques are used design effect is one.
When clustering sampling technique is used design
effect is two.
When multi stage sampling technique is used design
effect is equal to the number of stages.
02/03/25 [email protected] 58
02/03/25 [email protected] 59