0% found this document useful (0 votes)
111 views

L8 Statistical Estimation 1

This document discusses statistical estimation and sampling distributions. It provides the following key points: 1. Sampling distributions describe the distribution of sample statistics like the mean if multiple samples are taken from the same population. The central limit theorem states that the sampling distribution of the mean will follow a normal distribution, even if the population is not normal, as long as the sample size is large enough. 2. Point estimation uses a sample statistic like the mean or proportion to estimate the corresponding population parameter. Interval estimation provides a range of values that the population parameter is likely to fall within, using confidence intervals. 3. Factors that affect the width of a confidence interval include the sample size, standard deviation, and confidence

Uploaded by

ASHENAFI LEMESA
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views

L8 Statistical Estimation 1

This document discusses statistical estimation and sampling distributions. It provides the following key points: 1. Sampling distributions describe the distribution of sample statistics like the mean if multiple samples are taken from the same population. The central limit theorem states that the sampling distribution of the mean will follow a normal distribution, even if the population is not normal, as long as the sample size is large enough. 2. Point estimation uses a sample statistic like the mean or proportion to estimate the corresponding population parameter. Interval estimation provides a range of values that the population parameter is likely to fall within, using confidence intervals. 3. Factors that affect the width of a confidence interval include the sample size, standard deviation, and confidence

Uploaded by

ASHENAFI LEMESA
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

Statistical Estimation

By: Nigussie Yohanes(BSc, MPH/Epid & Biost, Assistant Professor)

February12, 2023
Sampling Distribution
Distribution of all possible values of a statistic computed from samples of
the same size randomly selected from the same population.

Due to random variation different samples from the same population will
have different sample means.

If we repeatedly take sample of the same size n from a population the
means of the samples form a sampling distribution of means of size n.

Serves to answer probability questions about sample statistics.


A. Sampling distribution of sample mean

• Suppose we have a population of size N=4, constituting the ages


of four outpatients.

x, Age (years): 18, 20, 22, 24

μ
x i
N
18  20  22  24
  21
4

σ
 i
(x  μ) 2

 2.236
N
Now consider all possible samples of size
n=2

1st 2nd Observation 1st 2nd Observation


Obs 18 20 22 24 Obs 18 20 22 24
18 18,18 18,20 18,22 18,24 18 18 19 20 21
20 20,18 20,20 20,22 20,24 20 19 20 21 22
22 22,18 22,20 22,22 22,24 22 20 21 22 23
24 24,18 24,20 24,22 24,24 24 21 22 23 24
• 16 possible samples • 16 Sample Means
(with replacement)
Sample means Freq P( )
18 1 0.0625
19 2 0.1250
20 3 0.1875
21 4 0.2500
22 3 0.1875
23 2 0.1250
24 1 0.0625
Sampling distribution of all sample means

16 Sample Means Sample Means


Distribution
1st 2nd Observation
Obs 18 20 22 24 P(x)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 x
Summary measures of this sampling distribution:
Add the 16 sample means & divide by 16.
Also calculate the SD of the sample means.

μx 
 x i

18  19  21    24
 21
N 16

σx 
 i x
(x  μ ) 2

N
(18 - 21)2  (19 - 21)2    (24 - 21)2
  1.58
16
Properties

1. The mean of the sampling distribution of means is the same as the


population mean, μ .

2. The SD of the sampling distribution of means is σ / √n .

3. The shape of the sampling distribution of means is approximately a


normal curve, regardless of the shape of the population distribution
and provided n is large enough (Central limit theorem).

 In practice, the approximation is a workable one if n is 30 or more.


Sampling Distribution of proportion

The sampling distribution of the sample proportion p posses the


following properties.

 The sample proportion p will be an estimate of the population mean P.

________
 The standard deviation of p is = √p(1-p) /n (called the standard error
of the proportion).

 Provided n is large enough the shape of the sampling distribution of p


is normal.

9
Main types of sampling distributions
A. Distribution of the sample mean

B. Distribution of the difference between two means

C. Distribution of the sample proportion

D. Distribution of the difference between two proportions

Definitions:

Parameter: Numerical value of some characteristics in a population

Statistic: Numerical value of some characteristics in a sample


Sample statistic Population parameter
Statistical Estimation

Estimation: is the process of determining a likely value for a


variable in the population based on information collected from the
sample.
The use of sample statistics to estimate population parameters.

E.g.
 Estimates for the proportion of smokers among all people aged 15 to 24
in the population
The mean level of a certain enzyme among healthy men.
Point Estimation

A single numerical value is used to estimate the corresponding


population parameter
 is an estimator of the population mean μ
 S is an estimator of the population standard deviation σ

 P is an estimator of the population proportion π


Point estimation…
 From a single sample we can calculate a sample statistic to
estimate a single parameter (a point estimate).
 Point estimate for population mean µ is
n

 xi
x = i =1
n

 Point estimate for population proportion is given by

 x
p=
n

 Where x is the total number of success (events)


14
Interval estimation
 Interval estimation: is a statement that a population parameter has
a value lying between two specified limits.
 The value of the sample statistic will vary from sample to sample
therefore to simply obtain an estimate of the single value of the
parameter is not generally acceptable.
 We need to take into account the sample to sample variation of
the statistic.
 A confidence interval defines an interval within which the true
population parameter is like to fall (interval estimate).
Confidence interval ……

A (1-α) 100% confidence interval for unknown population mean


and population proportion is given as follows;

 
 [ x  z . , x  z . ] for estimating mean
2 n 2 n
if  is unknown, it can be estimated by s.e
 
 [ p  z . p (1  p ) / n , p  z . p (1  p ) / n ] for estimating proportion
2 2

16
 The 95% confidence interval is interpreted in such a way that,
under the conditions assumed for underlying distribution, you are
95% confident that the interval contains the true parameter.

 90% CI is narrower than 95% CI since we are only 90% certain


that the interval includes the population parameter.

 The 99% CI is wider than 95% CI; the extra width meaning that
we can be more certain that the interval will contain the
population parameter.
 But to obtain a higher confidence from the same sample, we must be
willing to accept a larger margin of error (a wider interval).

 For a given confidence level (i.e. 90%, 95%, 99%) the width of the
confidence interval depends on the standard error of the estimate which
in turn depends on the:
1. Sample size:-The larger the sample size, the narrower the confidence
interval and the more precise our estimate.

 Lack of precision means in repeated sampling the values of the sample


statistic are spread out or scattered.
 The result of sampling is not repeatable.
 You can make the precision as high as you want by taking a large
enough sample.
 The margin of error decreases as√n increases.
2. Standard deviation:-The more the variation among the individual
values, the wider the confidence interval and the less precise the estimate.

 As sample size increases SD decreases.


1. C.I. for a population mean
(normally distributed)

a) Known variance (large sample size)

• A 100(1‐α)% C.I. for μ is

• α is to be chosen by the researcher, most common values of α are


0.05, 0.01, 0.001 and 0.1.
Example
 A physical therapist wished to estimate, with 99% confidence,
the mean maximal strength of a particular muscle in a certain
group of individuals.

 He assume that strength scores are approximately normally


distributed with a variance of 144.
 A sample of 15 subjects who participated in the experiment
yielded a mean of 84.3.
Solution:

⇒ We are 99% confident that the population mean is between


76.3 and 92.3.
The Z-test is applied when:
 The distribution is normal

 The population standard deviation σ is known or

 When the sample size n is large ( n ≥ 30) and

 With unknown σ (by taking S as estimator of σ).


But, what happens when n<30 and σ is unknown?
 We will use a t-distribution which depends on the number of
degrees of freedom (df).

 The t-distribution is a theoretical probability distribution (i.e. its


total area is 100 percent) and is defined by a mathematical function.

 The distribution is symmetrical, bell-shaped and similar to the


normal but more spread out.

 
 As the df decrease, the t-distribution becomes increasingly
spread out compared with the normal.

 The sample standard deviation is used as an estimate of σ (the


standard deviation of the population which is unknown) and
appears to be a logical substitute

 For large sample sizes (n ≥ 30), both t and Z curves are so close
together and it does not much matter which you use.
Degrees of Freedom
 It is defined as the number of values which are free to vary
after imposing a certain restriction on your data.

Example: If 3 scores have a mean of 10, how many of the scores


can be freely chosen?

Solution: The first and the second scores could be chosen freely
(i.e., 8 and 12, 9 and 5, 7 & 15, etc.)

But the third score is fixed (i.e., 10, 16, 8, etc.)


 Hence, there are two degrees of freedom
b) Unknown variance (small sample size n ≤ 30)

A 100(1‐α)% C.I. for μ is

 The t distribution density curve is bell shaped and symmetrical


about zero.
 Different curves for different df (ie sample sizes) and for very
large df very close to Z.
Table of t-distributions
 The table of t-distribution shows values of t for selected areas
under the t curve.

 Different values of df appear in the first column.

 The table is adapted for efficient use for either one or two-tailed
tests.
E.g1. If df = 8, 5% of t scores are above what value?

Solution:
 Look at along the row labeled “one tail” to the value .05;

 The intersection of the .05 column and the row with 8 in the df column
gives the value of t = 1.86.

E.g 2. Find to if n =13 and 95% of t scores are between –to and +to.

Solution:

df =13-1 = 12. If 95% of t scores are between -to and + to, then 5% are in
the two tails.
 Look at the table along the row labeled “two tail” to the value .05;

 The intersection of this .05 column and the row with 12 in the df
column gives to = 2.179.

E.g3. If df =5, what is the probability that a t score is above 2.015 or


below -2.015?

Solution:

Two tails are implied. Look along the “df =5” row to find the entry 2.02.
 The probability is .10
2. C.I. for the difference between population
means (normally distributed)

i) Known variance (2 independent samples)


A 100(1‐α)% C.I. for μ1 ‐ μ2 is
Characteristics of the sampling distribution of differences of
means:
1) The mean of the sampling distribution of differences of means equals the
difference of the population means

2) The standard deviation of the sampling distribution of differences of


means, also called the standard error of differences of means is denoted by

3) The sampling distribution is normal if both populations are normal, and is


approximately normal if the samples are large enough (even if the
populations aren’t normal).

• .
Eg1. If a random sample of 50 non-smokers have a mean life of 76 years
with a standard deviation of 8 years and a random sample of 65
smokers live 68 years with a standard deviation of 9 years,

A) What is the point estimate for the difference of the population means?

B) Find a 95% C.I. for the difference of mean lifetime of non-smokers


and smokers.

Solution:

A)

B)
Eg1. If a random sample of 50 non-smokers have a mean life of 76 years
with a standard deviation of 8 years and a random sample of 65
smokers live 68 years with a standard deviation of 9 years,

A) What is the point estimate for the difference of the population means?

B) Find a 95% C.I. for the difference of mean lifetime of non-smokers


and smokers.

Solution:

A) A point estimate for the difference of population means (μx1- μX2)


=76-68 = 8 years

B) At a 95% confidence level = 8 ± 1.96 (1.59)

= 8 ± 3.12 = (4.88 to 11.12 years)


ii) Unknown variances and small sample size

a) Equal variances(2 independent samples)

A 100(1‐α)% C.I. for μ1 ‐ μ2 is


Necessary Assumptions
 The groups must be independent.

 The theoretical distribution of sample means for each group must be


normally distributed (we can rely on the central limit theorem to
satisfy this).

 We need assumption of equal variance in the two groups


(Homogeneity of variance).
Example
A study of gonadal dysfunction in diabetic men.
Sample 1: Men with primary organic impotence
n1 = 11
= 524. 0 mean total testosterone value
S1 = 135.8
Sample 2: Men with primarily psychogenic impotence
n2 = 7
= 701.1, mean total testosterone value
S2 = 154.4
The data are normally distributed. Calculate a 99%C.I. for μ1 ‐
μ2.
Solution: We assume that the population variance are equal.

We are 99% sure that μ1 ‐ μ2 is between ‐379.2 and 25.0


3. C.I. for a population proportion (large
sample size)
• A 100(1‐α)% C.I. for π is

Example:

A study on dental health practice. Of 300 adults interviewed, 123


said that they regularly had a dental check‐up twice a year.
What is the 95% C.I. for π?
 P = 123/300 = 0.41 a point estimator of π.

 α = 0.05 ⇒ Z0.025 = 1.96


4. C.I. for the difference between two population
proportions (large sample size)
A 100(1‐α)% C.I. for π1 ‐ π2 is
Example
 Two hundred patients suffering from a certain disease were randomly
divided into two equal groups. Of the first group, who received the
standard treatment, 78 recovered within three days. Out of the other
100, who were treated by a new method, 90 recovered within three
days. The physician wished to estimate the true difference in the
proportions who would recovered within three days.
Solution:
The estimate of the difference in the population proportions is

P1 – P2 = 0.78 – 0.90 = ‐0.12


• The 95% C.I. Is

• we are 95% sure that the difference is between -0.22 and –0.02.

 Note: that the negative signs merely reflect the fact that better
results were obtained by using the new treatment.
Group Assignment
• Form Group • Probability Questions1
• Ashenaf Lamessa &1-2
• Mustafa Aman
• Yalew Yemanebrihan
• Tolossa Gizaw

You might also like