C-7 Sampling Distribu
C-7 Sampling Distribu
01/23/2023 1
Learning objectives
01/23/2023 2
Sampling distribution
01/23/2023 3
Con…
01/23/2023 5
Con…
• We consider sample statistics as random variables
Example:
• 01/23/2023
Repeat & repeat & repeat & ……….. 6
Con…
01/23/2023 7
Sampling Distribution of Sample Mean
• Distribution of values taken by statistic in all possible samples
of size n from the same population
• Model assumption: our observations xi are sampled from a
population with mean and variance 2
Sample 1 of size n x
Sample 2 of size n x
Sample 3 of size n x Distribution
Population Sample 4 of size n x of these values?
Sample 5 of size n x
Unknown Sample 6 of size n x
Parameter: Sample 7 of size n x
Sample 8 of size n x
.
.
.
01/23/2023 8
Sample Vs Population
01/23/2023 9
Con…
01/23/2023 10
Sample statistic corresponding population parameter
Sample statistic population parameter
• (sample mean) • μ (population mean)
01/23/2023 11
• One may generate the sampling distribution of means as
follows:
1. Obtain a sample of n observations selected completely
at random from a large population
– Determine their mean and then replace the observations in
the population.
01/23/2023 13
Con…
01/23/2023 14
How large is large enough?
For most distributions, n > 30 will give a sampling
distribution that is nearly normal
For fairly symmetric distributions, n > 15
01/23/2023 15
Con…
of the distribution.
• The standard deviation of the sampling distribution of
means is called the standard error of the mean.
01/23/2023 16
E.g. Suppose you have a population having four
members with values 10,20,30 and 40 . If you take all
conceivable samples of size 2 with replacement:
a. What is the frequency distribution of the sample
means i.e. Sampling distribution?
b. Find the mean and standard deviation of the
distribution (standard error of the mean).
01/23/2023 17
Possible samples xi (sample mean)
(10, 20) or (20, 10) 15
(10, 30 ) or (30, 10) 20
(10, 40) or (40, 10) 25
(20, 30) or (30, 20) 25
(20, 40) or (40, 20) 30
(30, 40) or (40, 30) 35
(10, 10) 10
(20, 20) 20
(30, 30) 30
(40, 40) 40
01/23/2023 18
A) frequency distribution of sample means
01/23/2023 19
Con…
i) The mean of the sampling distribution
=Σ xifi /Σfi
= 400 / 16 = 25
ii) The standard deviation of the mean =
σx = Σ ( xi - μ)2/Σfi
= {Σ (10 - 25)2 + (15 - 25)2 + …. + ( 40 - 25)2 } / 16
= 1000 / 16 = √62.5 = 7.9
01/23/2023 20
Eg2. For the population given above (10,20,30 and 40)
a) Find the population mean and
Show that the population mean ( μ ) = the mean of the
sampling distribution
b) Find the population standard deviation and show that the
standard error of the mean (σx = σ/√n )
a) μ = Σ xi / N = (10 + 20 + 30 + 40) / 4 = 25
b) σ2 = Σ(xi - μ)2 / N
• σx (standard error) =σ / √n
01/23/2023 23
01/23/2023 24
01/23/2023 25
01/23/2023 26
01/23/2023 27
Con…
The mean
The mean of all possible estimates obtained from
samples of identical size is equal to the true population
mean.
Although estimates obtained from individual samples
will vary, their overall mean (the mean of the sampling
distribution) will always be equal to the population
value.
01/23/2023 29
The standard deviation
The standard deviation of the sampling distribution
decreases as the sample size increases.
The standard deviation of a sampling distribution takes a
special name, standard error, often indicated by the letters
SE.
The standard deviation represents the variability in the
individual data
The standard error represents the variability in the
sample estimates
01/23/2023 30
• The standard deviation of the sampling distribution
measures how much the sample statistic varies from
sample to sample
• It is smaller than the standard deviation of the
population by a factor of √n
– Averages are less variable than individual observations
01/23/2023 31
The standard deviation
– The standard error depends inversely on the square root of the
sample size used.
For example, the standard error of the sample mean is:
01/23/2023 32
The central limit theorem and the shape
sample sizes.
01/23/2023 34
Con….
A random sample of n observations was obtained from a
population with mean μ and standard deviation σ.
If n sufficiently large, sampling distribution of sample
mean approximates a normal distribution with mean μ &
mean standard deviation i.e. standard error σ/√n.
The larger the sample size, the better will be the normal
approximation to the sampling distribution of the mean.
01/23/2023 35
Estimation
• In statistics, the observations in the sample are used
to learn about the population.
• Obviously, in order for the sample to learn us
something about the population, the sample needs to
be drawn randomly
• This procedure, in which information from the
sample is used to draw conclusions about the
population is called statistical inference or estimation
01/23/2023 36
Con…
• Inference is the process of making interpretations or
conclusions from sample data for the totality of the population
• Inferential statistics uses the sample results to make decisions
and draw conclusions about the population from which the
sample is drawn
• In statistics there are two ways though which inference can be
made
– Statistical estimation
– Statistical hypothesis testing
01/23/2023 37
Con…
Statistical Estimation:
• This is one way of making inference about the
population parameter where the investigator does not
have any prior notion about values or characteristics of
the population parameter
• Estimation is the procedure by which values are
assigned to a population parameter based on the
information collected from the sample
01/23/2023 38
Con…
The estimation procedure involves the following steps
Select a sample
Collect the required information from the member of
the sample
Calculate the value of the sample statistic
Assign value(s) to the corresponding population
parameter
01/23/2023 39
Con…
01/23/2023 40
Point estimate
• A single numerical value used to estimate the
corresponding population parameter
– is a single number based on sample data that can be
used as an estimator of the population parameter
µ p̂ p
01/23/2023 41
Properties of best estimator
01/23/2023 42
Con….
To explain these properties let θˆ be an estimator of θ
• Unbiased Estimator: An estimator whose expected value
is the value of the parameter being estimated
01/23/2023 43
Interval Estimation (large samples)
01/23/2023 45
Interval estimation
Provides range of Values
Based on observations from one sample
Gives information about closeness to unknown
population parameter
Stated in terms of probability
Level of confidence is denoted as (1-α)100%
01/23/2023
Never 100%Sure 46
01/23/2023 47
Con…
• A confidence interval is a specific interval estimate of a
parameter determined by using data obtained from a
sample and the specific confidence level of the estimate
• The confidence level is the probability that the value of
the parameter falls within the range specified by the
confidence interval surrounding the statistic
01/23/2023 48
Confidence interval estimation …..Continue
01/23/2023 49
Con….
01/23/2023 50
Con…
01/23/2023 51
01/23/2023 52
Con….
01/23/2023 53
Factors Affecting Interval Width
• Level of Confidence (1 ‐ α)
01/23/2023 54
Con…
01/23/2023 56
CI for single mean
100(1-α)% CI for μ when σ is known (sampling from normal
population or large sample)
Interpretation:
a. Probabilistic: in repeated sampling; 100(1-α)%, of all
intervals will include μ
b. Practical: we are 100(1-α)% confident that a single
interval contains μ
01/23/2023 57
If the population has a normal distribution and σ is
known, then a (1-α)100% confidence interval for μ is
given by:
01/23/2023 58
E.g. 2) A random sample of 100 drug-treated patients has a
mean survival time of 46.9 months. If the SD of the
population is 43.3 months, find a 95% confidence interval
for the population mean.
– (The population consists of survival times of cancer patients who
have been treated with a new drug)
– 46.9 ± (1.96) (43.3 /√100)
01/23/2023 60
If the population variance σ2 is not known and n is
Small (n<30) the population is normal:
is t(n-1)
01/23/2023 62
Con…
01/23/2023 64
Different curves for different df (i.e. sample sizes)
and for very large df t will be very close to Z
01/23/2023 65
01/23/2023 66
01/23/2023 67
01/23/2023 68
Example
1. In a study of preeclampsia, Kaminski and Rechberger found
the mean systolic blood pressure of 10 healthy, non
pregnant women to be 119 with a standard deviation of 2.1.
A. What is the estimated standard error of the mean?
B. Construct the 99% confidence interval for the mean of the
population from which the 10 subjects may be presumed to
be a random sample.
C. What is the precision of the estimate?
01/23/2023 69
C. Precision = 3.250 X 0.66
01/23/2023 = 2.16 70
01/23/2023 71
2. Sample of 20 houses studied to estimate the mean sprayable area
•tn-1, α/2=2.09
01/23/2023 72
Con…
=22.9 ± 2.8
=22.9-2.8, 22.9+2.8
=20.01, 25.7
• We are 95% confident that the total sprayable area of
a house is b/n 20.01 and 25,7m2
01/23/2023 73
Confidence interval for the difference of means
Consider two different populations.
The first population ( X ) has mean μx and standard deviation σx,
Then determine x
01/23/2023 mean – y mean 74
Con…
01/23/2023 75
Con…
01/23/2023 76
Con…
01/23/2023 77
Con…
01/23/2023 78
• The characteristics of the sampling distribution of
differences of means are:
( Mean = μx - μy )
01/23/2023 80
Eg1. If a random sample of 50 non-smokers have a mean
life of 76 years with a standard deviation of 8 years,
and a random sample of 65 smokers live 68 years with
a standard deviation of 9 years,
A) What is the point estimate for the difference of the
population means?
B) Find a 95% C.I. for the difference of mean lifetime of
non-smokers and smokers.
01/23/2023 81
Solutions
Population x (non-smokers)
01/23/2023 82
A) A point estimate for the difference of population means
(μx- μy)
=X mean - y mean =76-68 = 8 years
B) At a 95% confidence level,
Z = ± 1.96, σ(X mean - y mean )
=
=
= 1.59
01/23/2023 83
Hence, 95% C.I. for μx- μy
= 8 ± 1.96 (1.59)
= 8 ± 3.12 = (4.88 to 11.12 years)
01/23/2023 84
Con…
• The confidence intervals for the difference in means
provide a range of likely values for (μ1-μ2)
• Zero is the null value of the parameter (in this case the
difference in means)
• If a 95% confidence interval includes the null value,
then there is no statistically meaningful or statistically
significant difference between the groups
• If the confidence interval does not include the null
value, then we conclude that there is a statistically
significant difference between the groups
01/23/2023 86
Exercise
An anthropologist who wanted to study the heights of adult
men and women took a random sample of 128 adult men and
100 adult women and found the following summary results.
Adult men
Mean height= 170 cm
Standard deviation= 8 cm
Adult women
Mean height= 164 cm
Standard deviation= 6 cms
01/23/2023 88
01/23/2023 89
Con…
• The probability distribution of a sample proportion
is called its sampling distribution
• It gives different values that can assume and their
probabilities
01/23/2023 90
Con…
• The shape of sampling distribution of Pˆ is inferred from
central limit theorem
• According to the central limit theorem, the sampling
distribution of p is approximately normal for a sufficient
sample size
• In case of proportion, the sample size is considered to be
sufficiently large if np and nq are both greater than 5.
01/23/2023 91
Assumptions
– Two Categorical Outcomes
– Population Follows Binomial Distribution
– Normal approximation can be used
– n × p ≥5 & n × (1 - p) ≥ 5
01/23/2023 92
Con…
Therefore for large sample
• The sampling distribution of the sample proportion p
is (approximately) normal
• The mean of the sampling distribution of p is equal to
the population proportion P
δp =
01/23/2023 94
Con…
01/23/2023 95
Con…
01/23/2023 96
• Example: An epidemiologist is worried about the ever
increasing trend of malaria in a certain locality and wants to
estimate the proportion of persons infected in the peak
malaria transmission period. If he takes a random sample of
150 persons in that locality during the peak transmission
period and finds that 60 of them are positive for malaria, find
• a) 95% b)90% c)99% confidence intervals for the proportion of the
whole infected people in that locality during the peak malaria
transmission period
01/23/2023 97
Con…
• Sample proportion = 60 / 150 = 0.4
• The standard error of proportion depends on the
population P.
• However, the population proportion (P) is unknown.
In such situations,
01/23/2023 98
A. 95% C.I for the population proportion (the proportion of the
whole infected people in that locality)
= 0.4 ± 1.96 (0.04) = (0.4 ± .078) = (0.322, 0.478)
01/23/2023 99
• A study on dental health practice. Of 300 adults
interviewed, 123 said that they regularly had a dental
check‐up twice a year.
What is the 95% C.I. for π?
• P = 123/300 = 0.41 a point estimator of π.
• α = 0.05 ⇒ Z0.025 = 1.96
=0.41±1 .96
=(0.36, 0.46).
01/23/2023 100
C.I. for the difference between two population
proportions
01/23/2023 101
Con…
• The sampling distribution of the difference between two
sample proportions is constructed in a manner similar to
the difference between two means.
01/23/2023 102
Con…
01/23/2023 104
Con…
01/23/2023 105
Con…
• The distribution of the difference between two sample
proportions, p1-p2, is approximately normal
σ (Px - Py)=
01/23/2023 106
Con…
• Example: Each of two groups consists of 100 patients
who have leukemia. A new drug is given to the first group
but not to the second (the control group)
01/23/2023
= (0.02 to 0.28) 108
Thank you
01/23/2023 109