0% found this document useful (0 votes)
14 views

Sampling Distributions

The document discusses sampling distributions and their properties. It begins by explaining that a sample of observations from a population will have a joint probability distribution defined by the population's probability density function. It then introduces important sample statistics like the sample mean and variance that describe properties of the sample. The key point is that the distribution of a sample statistic, like the mean, is called its sampling distribution. Important sampling distributions include those of the sample mean and variance. The document explains the central limit theorem, which shows that the distribution of the sample mean will approach a normal distribution as sample size increases, regardless of the population distribution. It also introduces the t-distribution, which is used when the population variance is unknown.

Uploaded by

raachelong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Sampling Distributions

The document discusses sampling distributions and their properties. It begins by explaining that a sample of observations from a population will have a joint probability distribution defined by the population's probability density function. It then introduces important sample statistics like the sample mean and variance that describe properties of the sample. The key point is that the distribution of a sample statistic, like the mean, is called its sampling distribution. Important sampling distributions include those of the sample mean and variance. The document explains the central limit theorem, which shows that the distribution of the sample mean will approach a normal distribution as sample size increases, regardless of the population distribution. It also introduces the t-distribution, which is used when the population variance is unknown.

Uploaded by

raachelong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Lecture 7

Sampling distributions

CH2010 Engineering Statistics PL AY2023 v6 1


Distribution of individual observations
• Each measurement/observation in the sample follows the distribution of the
population f(x).
• Each measurement/observation in the sample is a random variable Xi.
• In a sample of n observations, we have X1, X2, X3,…, Xn.
• The probability for X1 to take value x1 is f(x1); the probability for X2 to take value
x2 is f(x2), etc…
• Define X1, X2,…,Xn as a random sample, it has a joint probability distribution of
f(x1, x2,…,xn) = f(x1)f(x2)…f(xn)

CH2010 Engineering Statistics PL AY2023 v6 2


Important statistics
• The probability distribution of a random variable observed from a population
can be described by parameters such as mean μ and variance σ2.
• The distribution of the observations/outcomes within a random sample can be
described by statistics.
(N.B. in this context, statistics refer to the plural form of a statistic, which must
be distinguished from the singular form statistics that is a branch of
mathematics)
• Important statistics include measures of location (centre) and measures of
variability (spread) – see Lecture 1.

CH2010 Engineering Statistics PL AY2023 v6 3


7.1 Sampling distribution
• The probability distribution of a sample statistic is a sampling distribution.

• The sampling distribution allows us to make inference about the population,


which we can study indirectly through random sampling.

• Key sampling distributions include the distributions of the sample means and
sample variances.

CH2010 Engineering Statistics PL AY2023 v6 4


7.2 Sampling distribution of sample mean
• If a random sample is taken from a normal population with mean μ and
variance σ2. Then the sample mean

X = (X1 + X 2 +  + X n )
1
n
has a normal distribution with mean

( )
2
1
 = ( +  +  +  ) = 
1
and variance  X = 2  +  +  +  =
2 2 2 2

n n n

• If the population distribution is unknown, the sampling distribution of 𝑋ത can


still be approximated by a normal distribution with mean μ and variance σ2/n,
given that the sample size is large enough.
CH2010 Engineering Statistics PL AY2023 v6 5
7.3 The Central Limit Theorem
If 𝑋ത is the mean of a random sample of size n taken from any population with mean μ and finite
variance σ2 (not implying the population is normaly distributed), then the distribution of
X −
Z= ,
/ n
is a standard normal distribution n(z;0,1) as n → ∞.
Also, the normal approximation for 𝑋ത will be generally good for n ≥ 30, provided that the
population distribution is not terribly skewed.
For n < 30, the approximate is good only if the population is not too different from a normal
distribution.
If the population distribution is known to be normal, then the distribution of the sample mean is
also normal, regardless of the sample size.

CH2010 Engineering Statistics PL AY2023 v6 6


The central limit theorem
The sample size n = 30 is
a guideline to use for the
Central Limit Theorem.

CH2010 Engineering Statistics PL AY2023 v6 7


Example 7.1
An electrical firm manufactures light bulbs that have a length of life that is
normally distributed, with mean of 800 hours and standard deviation of 40
hours. Find the probability that a random sample of 16 bulbs will have an average
life of less than 775 hours.
Solution
Given that the population is normally distributed, the sampling distribution of 𝑋ത will
always be normal regardless of the sample size, with
 X = 800 and  X = 40 / 16 = 10

CH2010 Engineering Statistics PL AY2023 v6 8


Example 7.1 (continued)
The desired probability is given by the area of the shaded region, as shown below.
We can find out the area of the shaded region
by firstly convert the random variable:
775 − 800
 x = 775  z = = −2.5
10

Therefore the probability that 𝑋ത < 775 is low.

CH2010 Engineering Statistics PL AY2023 v6 9


Inference on the population mean
A very important application of the sample mean and its sampling
distribution is to estimate the population mean.

We also need to understand how good (confident) is the estimation.

CH2010 Engineering Statistics PL AY2023 v6 10


Example 7.2
A process produces cylindrical component parts for cars.
It is important that the process produces parts of mean diameter 5.0 mm.
An engineer conjectures that the population mean is 5.0 mm.
100 parts produced by the process are sampled at random.
It is known that the population standard deviation is σ = 0.1 mm. The sample mean
ഥ = 5.027 mm.
diameter is found to be 𝒙
Does this sample information appear to support or refute the engineer’s conjecture?

CH2010 Engineering Statistics PL AY2023 v6 11


Example 7.2 (continued)
Whether the data support or refute the conjecture can be reflected by the probability
ഥ = 5.027 mm (in a sample of 100) when μ = 5.0 mm.
of obtaining a mean of 𝒙
ഥ ≥ 5.027 with n = 100 if the
In other words, how likely is it that one can obtain 𝒙
population mean is μ = 5.0?

Here, the probability we chose to compute is


P(𝑋ത – 5 ≥ 0.027 )

CH2010 Engineering Statistics PL AY2023 v6 12


Example 7.2 (continued)
ഥ to
Given that the sample size is sufficiently large (n = 100), we can approximate 𝒙
be normally distributed.

Therefore, the analysis indicates that the


supporting evidence strongly refutes the
conjecture (viz. μ = 5.0)!

CH2010 Engineering Statistics PL AY2023 v6 13


7.4 t-distribution
The Central Limit Theorem

allows us to estimate μ, knowing σ.


X −
When σ is unknown, we make use the random variable T =
S/ n
where T follows the t-distribution, aka Student t-distribution.

CH2010 Engineering Statistics PL AY2023 v6 14


7.4 t-distribution
The random variable T follows a t-distribution, with a density function given by:

− ( v +1) / 2
[(v + 1) / 2]  t 2

h(t ) = 1 +  ,−  t  
(v / 2) v  v 

Not examinable

with v degrees of freedom, where v = n – 1.

CH2010 Engineering Statistics PL AY2023 v6 15


7.4 t-distribution

Symmetric, bell-shaped distribution.


Smaller sample size = fewer degrees of
freedom = more “spread out”.
As v → ∞, the distribution tends to n(0, 1)

CH2010 Engineering Statistics PL AY2023 v6 16


7.4 t-distribution

The right-tailed cumulative probability for


the t-distribution are tabulated in Table A. 6.
Tabulated value
in Table A.6
The t-distribution is symmetric, therefore
P(T > t) = P(T < – t)
⇒ F(t) = 1 – F(– t)

CH2010 Engineering Statistics PL AY2023 v6 17


Example 7.3
A chemical engineer claims that the mean yield of a batch chemical process is 500
g/L. To verify this claim, he prepares and measures 25 batches each month. If the
computed T-value falls between –t0.05 and t0.05, he is satisfied with his claim. What
conclusion should he draw from a sample with a mean 𝑥ҧ = 518 g/L and a sample
standard deviation s = 40 g. Assume the yield is approximately normally distributed.
Solution
Sample size = 25 means 24 degrees of freedom.

X −  518 − 500
T= = = 2.25
S/ n 40 / 25

CH2010 Engineering Statistics PL AY2023 v6 18


Example 7.3 (cont.)
From Table A. 6, for 24 degrees of freedom, t0.05 = 1.711, –t0.05 = –1.711.
Thus, 2.25 > t0.05 suggests that the actual μ should be larger than the assumed μ, as
statistically inferred by the sample mean 𝑥.ҧ
In fact, the process should have a higher yield than the chemical engineer originally
claimed.
• From this example, we can see that the t-distribution is very useful at making
inference about the population mean when σ is unknown.

CH2010 Engineering Statistics PL AY2023 v6 19


N.B.
𝜇1 𝜇2 𝜇1 𝜇2
Increase n1, n2

𝑋1 𝑋2 𝑋1 𝑋2
Increase n1, n2

CH2010 Engineering Statistics PL AY2023 v6 20


7.5 Sampling distribution of S2
It turns out (proof not shown here) that
χ2 = (n – 1)S2/σ2
follows a chi-squared distribution with n – 1 degrees of freedom.
The probability that χ2 is greater than a specific value χα2 equals to the area under the
PDF curve, α.
Tabulated values of typical α
and corresponding χ2 values for
various degrees of freedom are
given in Table A.5.

CH2010 Engineering Statistics PL AY2023 v6 21


Example 7.4
A manufacturer of car batteries guarantees that the batteries will last, on average, 3
years with a standard deviation of 1 year. If five of these batteries have lifetimes of
1.9, 2.4, 3.0, 3.5 and 4.2 years, should be manufacturer be convinced that the battery
life really have a standard deviation of 1 year? Assume that the battery lifetime
follows a normal distribution.
Solution
Known: the population mean, μ, the population standard deviation, σ, the
sample distribution.
Unknown: the probability that the sample variance S2 reflects the assumed
population variance σ2.

CH2010 Engineering Statistics PL AY2023 v6 22


Example 7.4 (cont.)
First, find the sample variance:

1    
2

S =
2
n X i −   X i  
2

n(n − 1)  n  n  

=
1
5(5 − 1)

5(1.9 2 + 2.4 2 + 3.0 2 + 3.52 + 4.2 2 ) − (1.9 + 2.4 + 3.0 + 3.5 + 4.2) 2  = 0.815

CH2010 Engineering Statistics PL AY2023 v6 23


Example 7.4 (cont.)
Then, find the value of the associated chi-square:

with (n – 1) = 4 degrees of freedom


According to Table A.5, χ2 = 3.26 is in the region 0.1 < α < 0.9.
Alternatively, using Excel or GC to find P(χ2 ≤ 3.26) = 0.485.
Therefore, S2 = 0.815 is not an unlikely outcome when σ2 = 1.

CH2010 Engineering Statistics PL AY2023 v6 24


7.6 Sample distribution of the difference
between two means
Suppose we have two populations with mean μ1 & μ2 and variance σ12 & σ22. A random
sample is taken from each population, with sizes n1 & n2 and sample means 𝑿 ഥ1& 𝑿ഥ 2.
The sampling distribution of the differences of means, ഥ
𝑿1 − 𝑿ഥ 2 is approximately
normally distributed with mean and variance given by
 12  22
 X − X = 1 −  2 and  2
X1 − X 2
= +
1 2
n1 n2

Hence,
Z=
( X − X ) − (
1 2 1 − 2 )
( / n ) + (
2
1 1
2
2 / n2 )
is approximately a standard normal variable.
CH2010 Engineering Statistics PL AY2023 v6 25
Example 7.5
Two independent experiments compare two different
types of paint. 18 specimens are painted using type A,
and the drying time, in hours, is recorded for each
specimen. The same is done with type B paint. The
population standard deviations are both known to be
1.0.
Assuming the mean drying time is equal for the two
types of paint, find P( ഥ𝑋𝐴 − 𝑋ത𝐵 > 1.0), where ഥ𝑋𝐴 and
𝑋ത𝐵 are average drying times for samples of size nA = nB
= 18.

CH2010 Engineering Statistics PL AY2023 v6 26


Example 7.5 (continued)
From the sampling distribution of 𝑋A − 𝑋B , we know
that the distribution is approximately normal with
mean
 X A − X B =  A − B = 0
and variance
 A2  B2 1 1 1
 X2 A−XB
= + = + =
nA nB 18 18 9
Therefore, the desired probability is given by the
shaded region:

CH2010 Engineering Statistics PL AY2023 v6 27


Example 7.5 (continued)
Corresponding to the value 𝑋A − 𝑋B = 1.0,
we have

Z=
( X − X ) − (
A B A − B )
( / n ) + (2
A A
2
B / nB )
1− 0
= = 3.0
1/ 9

P( Z  3.0) = 1 − P( Z  3.0) = 1 − 0.9987 = 0.0013

CH2010 Engineering Statistics PL AY2023 v6 28


7.7 Comparing two sample variances
• A random sample of size n1 is selected from population 1 with variance σ12. Then
the random variable associated with S1:

is chi-squared distributed.
• Another sample of size n2 is selected from population 2 with variance σ22. Then
the random variable associated with S2:

is also chi-squared distributed.

CH2010 Engineering Statistics PL AY2023 v6 29


Comparing two sample variances
Let χ12 = U and χ22 = V, by definition, the statistic

S12 /  12  22 S12
F= 2 2 = 2 2
S2 /  2  1 S2

has an F-distribution with ν1 = n1 – 1 and ν2 = n2 – 1 degrees of freedom.


In other words, the F-distribution describes the sampling distribution of the ratio
of the sample variance to population variance of two random samples.

CH2010 Engineering Statistics PL AY2023 v6 30


F-distribution
The density function of the F distribution is given by:

 (v1 + v2 ) / 2(v1 / v2 ) v1 / 2 f ( v1 / 2 ) −1
 , f  0,
h( f ) =  (v1 / 2)(v2 / 2) (1 + v1 f / v2 ) ( v1 + v 2 ) / 2

 0, f  0.
𝑣2
mean of 𝜇= , for v2 >2
𝑣2 −2
2 2𝑣22 (𝑣1 +𝑣2 −2)
variance of 𝜎 = , for v2 > 4 Not examinable
𝑣1 𝑣2 −2 2 (𝑣2 −4)

with v1 and v2 degrees of freedom.

CH2010 Engineering Statistics PL AY2023 v6 31


F-distribution
However, it is much more common (and convenient) to use the cumulative
probability of f for F > fα, tabulated in Table A.7, or to use computers and calculators
to calculate the probabilities of f.

CH2010 Engineering Statistics PL AY2023 v6 32


F-distribution
Writing fα(v1, v2) for fα with v1 and v2 degrees of freedom, we obtain a convenient
identity for F-distribution (proof skipped):
1
f1− (v1 , v2 ) =
f (v2 , v1 )
For example,

f 0.05 (10,6) = 4.06


1 1
f 0.95 (6,10) = = = 0.246
f 0.05 (10,6) 4.06

CH2010 Engineering Statistics PL AY2023 v6 33


Example 7.6
A study was performed to determine whether chemical engineers and chemists differ in their
repeatability in assembling pipelines. Two samples of 25 chemical engineers and 21 chemists were
selected, and each subject assembled pipelines.
The two sample standard deviations of assembly time were SCE = 0.914 and SCh = 1.093 min. Is there
evidence to support the claim that chemical engineers have less repeatability than chemists for this
assembly task?
Answer
Use F-statistics to infer the comparison between two sample standard deviations: is it possible that
SCE = 0.914 and SCh = 1.093 when σCE = σCh?

CH2010 Engineering Statistics PL AY2023 v6 34


Example 7.6 (continued)
𝑆12 /𝜎12 0.9142 /1
𝐹= = = 0.699 with 24 and 20 degrees of freedom.
𝑆22 /𝜎22 1.0932 /1

𝑃 𝐹 24,20 ≤ 0.699 = 0.200 (accordingly to Excel)

Alternatively, looking up Table A.7


f0.90(24,20) = 1/f0.10(20,24) = 1/1.73 = 0.578 < 0.699
Therefore, it is not significantly (<10%) unlikely that
SCE = 0.914 and SCh = 1.093, when σCE = σCh.
In other words, there is insufficient evidence to suggest
that chemical engineers have less repeatability than chemists for this assembly task.
CH2010 Engineering Statistics PL AY2023 v6 35
Summary
• Sampling distribution of the sample mean

𝑋−𝜇
• When σ is known, use central limit theorem: 𝑍 = 𝜎/ ⇒ Table A.3
𝑛

𝑋−𝜇
• When σ is unknown: 𝑇 = 𝑆/ with (n – 1) degrees of freedom ⇒ Table A.6
𝑛

• Sampling distribution of the sample variance


• Use chi square distribution: χ2 = (n – 1)S2/σ2 with (n – 1) degrees of freedom ⇒ Table A.5
• Sampling distribution of the difference between two sample means
𝑋ത1 −𝑋ത2 − 𝜇1 −𝜇2
• When σ is known: 𝑍 =
𝜎12 /𝑛1 + 𝜎22 /𝑛2

• Sampling distribution of the ratio between to sample variances


𝑆12 /𝜎12
• Use F-distribution: 𝐹 = with (v1 = n1 – 1) and (v2 = n2 – 1) ⇒ Table A.7
𝑆22 /𝜎22

CH2010 Engineering Statistics PL AY2023 v6 36

You might also like