CH I - Sampling and Sampling Distributions (6)
CH I - Sampling and Sampling Distributions (6)
SAMPLING DISTRIBUTION
NOTE: The normal probability distribution is used to determine probabilities for the
normally distributed individual measurements, given the mean and the standard
deviation. Symbolically, the variable is the measurement X, with the population
mean µ and population standard deviation δ. In contrast to such distributions of
individual measurements, a sampling distribution is a probability distribution for
the possible values of a sample statistic.
Population distribution: Is the distribution of measured values of its members and have
mean denoted by μ and variance δ 2and standard deviation σ . The population standard
deviation describes the variation among values of members of the population; where as the
standard deviation of sampling distribution measures the variability among values of the
statistics (sample) such as mean values, proportion values due to sampling errors.
NB: The sampling distribution of the mean is not the sample distribution, which is the
distribution of the measured values of X in one random sample. Rather, the sampling
distribution of the mean is the probability distribution for X , the sample mean.
For any given sample size n taken from a population with mean µ and standard deviation
δ, the value of the sample mean would vary from sample to sample if several random
samples were obtained from the population. This variability serves as the basis for
sampling distribution.
Page 1 of 13
The sampling distribution of the mean is described by two parameters: the expected value
( X ) = X , or mean of the sampling distribution of the mean, and the standard deviation
of the mean
δ x , the standard error of the mean.
1. The arithmetic mean μ x of the sampling distribution of mean values is equal to the
population mean μ regardless of the form of population distribution .i.e. μ x= μ
2. The sampling distribution has a standard deviation (also called standard error) equal
to the population standard deviation divided by the square root of the sample size
σ
i.e., δ x = . This holds true if and only of n<0.05N and N is very large. If N is
√n
finite and
δ x=
S
√n
or Sδ x =
√
S N−n
√ n N −1
, n ≥ 0.05 N
S x=
S or
√n
S x=
S N−n
√n N −1√
4. A sample size n≥30 is generally considered to be a large sample for statistical
analysis where as a sample of size n¿ 30 is considered to be a small sample. The
sampling distribution of means is approximately normal for sufficiently large
sample sizes (n≥ 30).
5. When standard deviation of population σ is not known, the standard deviation of
the sample s which closely approximates σ value is used to compute standard error,
s
i.e.δ x = .
√n
A population consists of the following ages: 10, 20, 30, 40, and 50. A random sample
of three is to be selected from this population and mean computed. Develop the
sampling distribution of the mean.
Page 2 of 13
Solution: The number of simple random samples of size n that can be drawn without
N!
replacement from a population of size is N C = With N= 5 and n = 3, 5C3 = 10
n
n ! (N −n)!
samples can be drawn from the population as:
μ=
∑ X = ∑ x =30 ,
N n Regardless of the sample size μ= X .
x (Observation) x−μ (x−μ)
2
10 -20 400
20 -10 100
30 0 0
40 10 100
50 20 400
∑ (x−μ)2 1,000
Page 3 of 13
σ=
√ ∑ ( X i −X )2
N
=
√ 1000
5
=14 . 142
σ X=
δ
√n
∗
√ N−n 14 .142 5−3
=
N−1 √3
∗
5−1 √
=5 .774
=
√
N
∑ ( X i −X )2
333 . 4
10
=
√
=5 . 774
δ
Since averaging reduces variability x < δ except the cases where δ = 0 and n =
1.
Central Limit Theorem and the Sampling Distribution of the Mean
The Central Limit Theorem (CLT) states that:
The relationship between the shape of the population distribution and the shape of the
sampling distribution of the mean is called the Central Limit Theorem.
The significance of the Central Limit Theorem is that it permits us to use sample statistics
to make inference about population parameters without knowing anything about the
shape of the frequency distribution of that population other than what we can get from the
sample. It also permits us to use the normal distribution curve for analyzing distributions
whose shape is unknown. It creates the potential for applying the normal distribution to
many problems when the sample is sufficiently large.
As mentioned earlier the above properties must exist, given this value of sample mean X
is first converted in to a value Z on the standard normal distribution to know how any
single value deviates from X of sample mean values ( μ x), by using the formula;
X−μ
X−μ x
Z= = δ because μ x= μ
δx
√n
If the population is finite and samples of fixed size n are drawn without replacement, then
the standard error of sampling distribution of mean can be modified to adjust the continued
Page 4 of 13
change in the size of population μ due to the several draws of samples of size n is as
follows:
Example 1. The mean length of a certain tool is 41.5 hours with a standard
deviation of 2.5 hours. What is the probability that a simple random sample
of size 50 drawn from this population will have a mean between 40.5 hours
and 42 hours?
P (40.5≤ X ≤42.0) =?
δ 2.5 2.5
μ x= μ δ x = = = = 0.3536
√ n √50 7.0711
The population distribution is unknown, but sample size n=50 is large enough to apply the
central limit theorem. Hence the normal distribution can be used to find the required
probability.
X 1−μ X 2−μ
P (40.5≤ X ≤420) = P ( ≤Z≤ )
δx δx
40.5−41.5 42−41.5
=P( ≤Z≤ )
0.3536 0.3536
= P (−2.8281 ≤ Z ≤ 1.4140)
=P ( Z ≥−2.8281) + P ( Z ≤ 1.4140)
=0.4977+0.4207=0.9184
Thus 0.9184 is the probability of the tool having mean life between the required hours.
δ=2.5
0.497
0.420
Page 5 of 13
Solution:
A. P ( x ≥ 900 ) =?
μ X =μ=800gms δ =300gms
n=16
P ( x ≥ 900 ) =?
δ 300 300
δx = = = = 75
√ n √16 4
0.09
μ X =800 X =900
X−μ x 900−8 00
P ( x ≥ 900 ) =P (Z≥ = ¿
δx 75
=P (Z≥ 1.33 ¿
=0.5000-0.4082
=0.0918
B. Since Z=1.96 for the middle 95% area under the normal curve, therefore using the formula
for z to solve for the values of x in terms of the known values are as follows.
x 1= μ X -Zδ x x 2= μ X +Zδ x
=800-1.96(75) =800+1.96(75)
=653gms =947gms
0.9
5
δ =300
Page 6 of 13
number of success , X
P=
sample ¿ n
With same logic of sampling distribution of mean, the sampling distribution of sample
proportions with mean μ P and standard deviation also called standard error) δ P is given by:
√ √
μ P = P and δ P = pq = p(1−P)
n n
A. np≥5
B. nq≥5
Then the sampling distribution of proportions is very closely normally distributed. It may
be noted that the sampling distribution of the proportion would actually follow binomial
distribution because population is binomially distributed.
For finite population in which sampling is done without replacement we have;
√ √
μ P = P and δ P = pq * N −n
n N −1
Under the same guidelines as mentioned in the previous sections, for a large sample size n ≥
30, the sampling distribution of proportion is closely approximated by a normal distribution
with a mean and standard deviation as stated above. Hence, to standardize sample
proportion P, the standard normal variable.
P−P
P−μ P
Z=
δP
√
= pq
Example 3.
n
Few years back, a policy was introduced to give loans to
unemployed engineers to start their own business. Out of 1,000,000
engineers, 600,000 accepted the policy and got the loan. A sample of 100
unemployed engineers is taken at the same time of allotment of loans. What
is the probability that sample portion would have exceeded 50%
acceptance?
Solution:
μ P = P=0.60 N=1,000,000
n=100 P ( P ≥ 0.5) =?
√
δ P = pq √ N −n ¿ ¿=¿ )( √1,000,000−100 ¿ ¿ )
n N −1
δ P =0.0489
1,000,000−1
P−μP 0.50−0.60
P ( P ≥ 0.5) =P (Z≥ ) =P (Z≥ ) =0.4793+0.5000=0.9793
δP 0.0489
Page 7 of 13
0.47
93 0.50
00
P=0.5 P=0.60
μ P = P=0.40 n=200
√
P−P
δ P = ( 0.4 ) (0.6) =0.0346 P (-0.03≤ P≤ 0.03) = 2P (Z≥ )
200 δP
= 2P (Z ≤ 0.87 ¿
=2x0.3078
=0.6156
0.3 0.3
P=−0.03 P=0.40
μ P = P=0.03 P2=0.035
P1=0.02 n=300
√
δ P = ( 0.03 ) (0.97) =0.0098
300
Page 8 of 13
P−P P−P
P (-0.03≤ P≤ 0.03 ) = P ( ≤ Z≤ )
δP δP
0.02−0.03 0.035−0.03
=P( ≤Z≤ )
0.0098 0.0098
= P (-1.02≤ Z ≤ 0.51)
=P (Z≥−1.02) + P (Z≤ 0.51)
=0.3461+0.1950
= 0.5411
Hence the probability that the proportion of defective will lie between 0.02 and
0.035 is 0.5411
0.34 0.19
61 50
P1=0.02 P=0.03 P2
=0.035
Let X 1 ∧X 2be the mean of sampling distribution of the mean of two populations,
respectively. Then the difference between their mean values μ1and μ2can be
estimated by generalizing the formula of standard normal variable as follows;
( X 1− X 2 )−(μ X −μ X ) ( X 1− X 2 )−(μ1−μ2 )
Z= =
1 2
δ (X −X )
1 2
δ (X − X )
1 2
δ ¿¿= √ δ X 2 + δ X 2 =
1 2
√ δ 12 δ 22
+
n1 n2
(standard error of sampling distribution of difference of two means)
Page 9 of 13
n1 and n2 are independent random samples drawn from first and second
population , respectively.
√ √
2 2 2 2
δ ( X −X )= δ 1 + δ 2 = (200) + (100) = √ 80+320=√ 400 =20
n1 n2 125 125
1 2
P ( X 1 −X 2 ≥ 160) = P ( Z ≥ ¿ ¿)
160−200
=P ( Z ≥ )
20
=P ( Z ≥ −2)
=0.5000+0.4772
=0.9772 (area under normal curve)
0.97
72
X 1 −X 2=160 μ X −X =200
1 2
Hence, the probability is very high that the life time of the stereos of A is 160
hours more than that of b.
Page 10 of 13
( X 1−X 2 )( μ1−μ2 )
P ( X 1 −X 2 ≥ 250) = P (Z ≥
δ ( X −X )
1 2
250−200
=P ( Z ≥ )
20
=P ( Z ≥ −2.5 )
=0.5000 - 0.4938
=0.0062 (area under normal curve)
0.00
Given:
μ1= 4,500 μ2= 4,000
δ 1=200 δ 2=300
n1 =50 n2 =100
√ √
2 2 2 2
δ ( X −X )= δ 1 + δ 2 = (200) + (300) = =41.23
n1 n2 50 100
1 2
P ( X 1 −X 2 ≥ 600) = P ( Z ≥ ¿ ¿)
600−500
=P ( Z ≥ )
41.23
=P ( Z ≥ 2.43)
=0.4925
=0.5000 - 0.4925=0.0075 (area under normal curve)
Page 11 of 13
0.007
Suppose two populations of size N 1and N 2are given. For each sample of size n1 from the
first population, compute sample proportion P1and standard deviation δ P . Similarly for
1
each sample size of n2 from the second population, compute sample proportion P2 and
standard deviation δ P . 2
For all combinations of these samples from these populations, we can obtain a sampling
distribution of the difference P1−P2 of sample proportion. Such a distribution is called
sampling distribution of the difference of two proportions. The mean and standard
deviation of this distribution are given by;
μ P −μ P = P1−P2
1 2
δ ¿¿= √ δ P 2 + δ P 2 =
1 2
√ P1 q 1 P 2 q 2
n1
+
n2
If sample size n1∧n1 are large i.e. n1 ≥30, then the sampling distribution of difference of
proportions is closely approximated by a normal distribution.
n1 =250 n2=300
δ ( P −P ) = √ δ P 2 + δ P 2 =
1 2 1 2
√ P1 q 1 P 2 q 2
n1
+
n2
δ ( P −P ) = √ 0.0052 = 0.0228
1 2
( P1−P2 )−(P1−P2 )
P¿0.02) =P ( Z ≥
δ ( P −P )
1 2
0.02−0.05
=P ( Z ≥
)
0.0228
=P ( Z ≥ −1.32)
=0.5000 - 0.4066=0.0934 (area under normal curve)
Hence the desired probability for the difference in sample proportions is 0.0934
0.09
Page 13 of 13