0% found this document useful (0 votes)
5 views

Biostat Lecture seven

The document discusses statistical estimation techniques, focusing on the concepts of estimation, estimators, and estimates, as well as the methods of point and interval estimation. It explains the importance of sampling error and standard error in estimating population parameters and provides formulas for constructing confidence intervals. Additionally, it covers the implications of sample size on the precision of estimates and the use of Student's t distribution for small sample sizes.

Uploaded by

Hasen umer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Biostat Lecture seven

The document discusses statistical estimation techniques, focusing on the concepts of estimation, estimators, and estimates, as well as the methods of point and interval estimation. It explains the importance of sampling error and standard error in estimating population parameters and provides formulas for constructing confidence intervals. Additionally, it covers the implications of sample size on the precision of estimates and the use of Student's t distribution for small sample sizes.

Uploaded by

Hasen umer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 59

Statistical Estimation Techniques

TESHOME DEMIS (BSc, MPH)

02/03/25 [email protected] 1
Introduction
In the real world, the values of population parameters are fixed
and usually not known.
Instead, we must try to say something about the way in which
a variable is distributed using the information contained in a
sample of observations.
The process of drawing conclusions about an entire population
based on the data in a sample is known as statistical inference.
Two broad categories: Estimation and Hypothesis testing.

02/03/25 [email protected] 2
Estimation
Is concerned with estimating the values of specific population
parameters based on sample statistic.
Is about using information in a sample to make estimates of the
characteristics (parameters) of the source population.

Examples: A sample survey revealed:


 Proportion of smokers among a certain group of population aged 15 to 24.

 Mean of SBP among sampled population

The next question is what can we predict about the characteristics of


the population from which the sample was drawn
02/03/25 [email protected] 3
Estimation, Estimator & Estimate
♣ Estimation is the computation of a statistic from sample data,
often yielding a value that is an approximation (guess) of its
target, an unknown true population parameter value.

♣ The statistic itself is called an estimator and can be of two


types - point or interval.

♣ The value or values that the estimator assumes are called


estimates.
02/03/25 [email protected] 4
Two methods of estimation are commonly used:
point estimation and interval estimation

Point estimation involves the calculation of a single number to


estimate the population parameter
Interval estimation specifies a range of reasonable values for
the parameter

02/03/25 [email protected] 5
Point versus Interval Estimators
♣ An estimator that represents a "single best guess" is called a
point estimator.

♣ When the estimate is of the form of a "range of plausible


values", it is called an interval estimator.

 Thus,
 A point estimate is of the form: [ Value ],

 Whereas, an interval estimate is of the form: [ lower limit,


upper limit ]
02/03/25 [email protected] 6
Sample mean ( ) is an unbiased estimator of population mean.

02/03/25 [email protected] 7
Estimating the Sampling Error

 Any estimates derived from samples are subject to the


sampling error.
 This comes from the fact that only a part of the population
was observed, instead of the whole.
 A different samples could have come up with different results.

 The amount of variation that exists among the estimates from


the different possible samples
02/03/25
is the sampling error.
[email protected] 8
 The standard deviation of this set of sample means is and is
referred to as the standard error of the mean (sem) or the standard
error.
 The sem is estimated by if  is unknown.

02/03/25 [email protected] 9
 The sampling error is dependent on sample size (n) and the

variability of individual sample points ().


 As n increases, the sample mean ( ) and the sample variance
s2 approach the values of the true population parameters, µ
and 2, respectively.

02/03/25 [email protected] 10
Example
 Suppose that the mean ± sd of DBP on 20 males is 78.5 ± 10.3 mm
Hg.

1. What is our best estimate of µ ?

2. What is the sem?

3. Compare the sem with the sd.

02/03/25 [email protected] 11
 The following table gives the se for mean of DBP for different
sample sizes.
 Our best estimate of µ is 78.5.

 The sem of this estimate is 10.3/√20 = 2.3

 The sem (2.3) is much smaller than sd (10.3).

02/03/25 [email protected] 12
1. Point Estimate
 A single numerical value used to estimate the corresponding
population parameter.
Sample Statistic are Estimators of Population Parameters

Sample mean, µ
Sample variance, S2 2
Sample P or π
proportion, p OR
Sample Odds Ratio,
RR

02/03/25 [email protected] ρ 13
Sample Relative Risk, RŔ
2. Interval Estimation
 Interval estimation specifies a range of reasonable values for
the population parameter based on a point estimate.
 A confidence interval is a particular type of interval estimator.

Confidence Intervals
 Give a plausible range of values of the estimate likely to include
the “true” (population) value with a given confidence level.
 An interval estimate provides more information about a
population characteristic than does a point estimate
02/03/25 [email protected] 14
 CIs also give information about the precision of an estimate.

 When sampling variability is high, the CI will be wide to reflect


the uncertainty of the observation.

 Wider CIs indicate less certainty.

 CIs can also answer the question of whether or not an


association exists (analogous to p-values…).

 Narrow CI widths reflects large sample size or low variability


or both.
02/03/25 [email protected] 15
General Formula:
The general formula for all CIs is:

The value of the statistic in sample


(eg., mean, proportions, etc.)
point estimate  (measure of how confident we want to be)
 (standard error)

From a Z table or a T table, depending on the


sampling distribution of the statistic.

02/03/25 [email protected] 16
A confidence interval has 3 components:

1) A point estimate (e.g. the sample mean)

2) The standard error of the point estimate ( e.g. SEM =σ/√ n )

3) A confidence coefficient (conf. coeff)


Lower limit = Point Estimate - (Critical Value/ confidence
coefficient) x (Standard Error)
Upper limit = Point Estimate + (Critical Value/ confidence
coefficient) x (Standard Error)
02/03/25 [email protected] 17
Confidence Level
 Confidence Level:

 Confidence in which the interval will contain the unknown

population parameter
 A percentage (less than 100%)

Example: 95%
 Also written (1 - α) = .95

02/03/25 [email protected] 18
Definition of 95% CI
1. Probabilistic interpretation:
 If all possible random samples of a given sample size were obtained
and if each were used to obtain its own CI, then 95% of all such CIs
would contain the unknown population parameter; the remaining 5%
would not.

2. Practical interpretation
 When sampling is from a normally distributed population with known
standard deviation, we are 100 (1-α) [e.g., 95%] confident that the
single computed interval contains the unknown population
parameter.
02/03/25 [email protected] 19
Estimation for Single Population

02/03/25 [email protected] 20
1. CI for a Population Mean (normally distributed)

A. Known variance (large sample size)

Consider the task of computing a CI estimate of μ for a


population distribution that is normal with σ known.
 Available are data from a random sample of size = n.

02/03/25 [email protected] 21
Assumptions
 Population standard deviation () is known

 Population is normally distributed

 If population is not normal, use large sample

A 100(1-)% C.I. for  is:

  is to be chosen by the researcher, most common values


of  are 0.1, 0.05 and [email protected]
02/03/25 0.01. 22
3. Commonly used CLs are 90%, 95%, and 99%

02/03/25 [email protected] 23
Finding the Critical Value

02/03/25 [email protected] 24
Margin of Error
(Precision of the estimate)

02/03/25 [email protected] 25
Factors Affecting Margin of Error

The CI for mean or margin of error is determined by n, s,


and α.
As n increases, the CI decreases.

As s increases, the length of CI increases.

As the confidence level increases (α decreases), the length


of CI increases.
02/03/25 [email protected] 26
Example:
1. Waiting times (in hours) at a particular hospital are believed to
be approximately normally distributed with a variance of
2.25 hr.

a. A sample of 20 outpatients revealed a mean waiting time of


1.52 hours. Construct the 95% CI for the estimate of the
population mean.

b. Suppose that the mean of 1.52 hours had resulted from a


sample of 32 patients. Find the 95% CI.

c. 02/03/25
What effect does larger sample size have on the CI?
[email protected] 27
a.
2.25
1.52 1.96 1.52 1.96(.33)
20
1.52 .65 (.87, 2.17)

 We are 95% confident that the true mean waiting time is


between 0.87 and 2.17 hrs.
 95% of the intervals formed in this manner will contain the true
mean.

02/03/25 [email protected] 28
b. 2.25
1.52 1.96 1.52 1.96(.27)
32
1.52 .53 (.99, 2.05)

c. The larger the sample size makes the CI narrower (more


precision).

02/03/25 [email protected] 29
 When constructing CIs, it has been assumed that the standard
deviation of the underlying population,  , is known
 What if  is not known?

 In this case, the SE of the population can be replaced by the


SE of the sample if the sample size is large enough (n>30).
With large sample size, we assume a normal distribution.

02/03/25 [email protected] 30
 Example: It was found that a sample of 35 patients were 17.2
minutes late for appointments, on the average, with SD of 8
minutes. What is the 90% CI for µ? Ans: (14.98, 19.42).
 Since the sample size is fairly large (>30) and the
population SD is unknown, we assume the distribution
of sample mean to be normally distributed based on
the CLT and the sample SD to replace population .

02/03/25 [email protected] 31
B. Unknown variance
(small sample size, n ≤ 30)
 What if the  for the underlying population is unknown and
the sample size is small?

 As an alternative we use Student’s t distribution .

02/03/25 [email protected] 32
02/03/25 [email protected] 33
Student’s t Distribution
 The t is a family of continuous probability distributions

 Bell Shaped

 Symmetric about zero (the mean)

 Flatter than the Normal (0,1). This means

The variability of a t is greater than that of a Z that is


normal(0,1)
Thus, there is more area under the tails and less at center

Because variability is greater, resulting confidence intervals


02/03/25 [email protected] 34
will be wider.
• Note: t approaches z as n increases

02/03/25 [email protected] 35
Student’s t Table

02/03/25 [email protected] 36
t distribution values
 With comparison to the Z value

02/03/25 [email protected] 37
Example

 Standard error =

 t-value at 90% CI at 19 df =1.729

02/03/25 [email protected] 38
02/03/25 [email protected] 39
2. CIs for population proportion, p

Is based on three elements of CI.


Point estimate

SE of point estimate

Confidence coefficient
02/03/25 [email protected] 40
02/03/25 [email protected] 41
02/03/25 [email protected] 42
Lower limit = Point Estimate - (Critical Value) x (Standard
Error of Estimate)
Upper limit = Point Estimate + (Critical Value) x (Standard
Error of Estimate)

Hence,

is an approximate 95% CI for the true proportion p.

02/03/25 [email protected] 43
Example 1
 A random sample of 100 people shows that 25 are left-
handed. Form a 95% CI for the true proportion of left-
handers.

02/03/25 [email protected] 44
Interpretation

02/03/25 [email protected] 45
Example
 It was found that 28.1% of 153 cervical-cancer cases had never
had a Pap smear prior to the time of case’s diagnosis. Calculate
a 95% CI for the percentage of cervical-cancer cases who never
had a Pap test.

02/03/25 [email protected] 46
Sample size Determination
Too small sample size :
May fail to detect an important effect

Estimates of effect may be too imprecise (wide CI’s)

Too many sample size:


May results in wastage of resources.

To make generalizations about entire population, we need


a total sample size of 200-400
02/03/25 [email protected] 47
Confidence interval approach
 Given confidence interval
mean ( proportion ) z  s.e
2

 Hence the absolute precision denoted by d is given as


Margin of error
d = z s.e

 Where s.e is the standard error2 of the estimator of the
parameter of interest.

02/03/25 [email protected] 48
Steps to determine sample size:
1. Specify tolerable error (i.e., desired precision and confidence
level via d and  )

2. Identify appropriate equation relating tolerable error (d, ) to


sample size (n)

3. Estimate unknown quantities in equation

4. Solve for n

5. Evaluate (and return to first step)


sample size calculation should relate to the study’s outcome
variable
02/03/25 [email protected] 49
Estimating a single population
mean/proportion

02/03/25 [email protected] 50
Examples
1. A survey is being planned to determine what proportion of
families in a certain area are medically indigent. It is found
that the proportion is 0.35 from previous studies. A 95%
confidence interval is desired with d=5% What size sample of
families should be selected?

02/03/25 [email protected] 51
2. Suppose that you are interested to know the proportion of
infants who breastfed >18 months of age in a rural area.
Suppose that in a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample size is required to
estimate the true proportion within ±3% points with 95%
confidence. Let p=0.20, d=0.03, α=5%

02/03/25 [email protected] 52
Example
3. Suppose that for a certain group of cancer patients, we are
interested in estimating the mean age at diagnosis. We would like
a 95% CI and wants margin of error of 2 units.

If the population SD is 124 years, how large should our sample


be?

= 1.96*1.96*124 = 119
2*2

02/03/25 [email protected] 53
Suppose there is no prior information about the proportion
(p) who breastfeed

For a fixed absolute precision (d), the required sample


size increases as P increases form 0 to 0.5, and then
decreases in the same way as the prevalence
approaches 1.

02/03/25 [email protected] 54
 An estimate of p is not always available.

 However, the formula may also be used for sample size


calculation based on various assumptions for the values of
p.
P = 0.1  n = (1.96)2(0.1)(0.9)/(0.05)2 = 138
P = 0.2  n = (1.96)2(0.2)(0.8)/(0.05)2 = 246
P = 0.3  n = (1.96)2(0.3)(0.7)/(0.05)2 = 323
P = 0.5  n = (1.96)2(0.5)(0.5)/(0.05)2 = 384
P = 0.7  n = (1.96)2(0.7)(0.3)/(0.05)2 = 323
P = 0.8  n = (1.96)2(0.8)(0.2)/(0.05)2 = 246
02/03/25 [email protected] 55
Some Considerations

02/03/25 [email protected] 56
Using design effect
 The loss of effectiveness by the use of cluster sampling,
instead of simple random sampling, is the design effect.
 The design effect is basically the ratio of the actual variance,
under the sampling method actually used, to the variance
computed under the assumption of simple random
sampling

02/03/25 [email protected] 57
Using design effect cont.…
 When simple and systematic random sampling
techniques are used design effect is one.
 When clustering sampling technique is used design
effect is two.
 When multi stage sampling technique is used design
effect is equal to the number of stages.

02/03/25 [email protected] 58
02/03/25 [email protected] 59

You might also like