Lecture 14 Simple Random Sampling 3
Lecture 14 Simple Random Sampling 3
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇:
𝑁𝑁
𝑇𝑇 = � 𝑦𝑦𝑖𝑖 = 𝑁𝑁𝑁𝑁.
𝑖𝑖=1
1
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉:
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌) = 𝜎𝜎 2 = 𝐸𝐸[𝑌𝑌 − 𝐸𝐸(𝑌𝑌)]2
= 𝐸𝐸(𝑌𝑌 − 𝜇𝜇)2 = 𝐸𝐸(𝑌𝑌 2 ) − 𝜇𝜇2 = 𝐸𝐸(𝑌𝑌 2 ) − [𝐸𝐸(𝑌𝑌)]2
𝑁𝑁 𝑁𝑁 2
1 1
= � 𝑦𝑦𝑖𝑖2 − � � 𝑦𝑦𝑖𝑖 �
𝑁𝑁 𝑁𝑁
𝑖𝑖=1 𝑖𝑖=1
𝑁𝑁 𝑁𝑁
1 1
= �� 𝑦𝑦𝑖𝑖2 − 𝑁𝑁𝜇𝜇2 � = �(𝑦𝑦𝑖𝑖 − 𝜇𝜇)2
𝑁𝑁 𝑁𝑁
𝑖𝑖=1 𝑖𝑖=1
That is,
𝑁𝑁
1
𝜎𝜎 2 = �(𝑦𝑦𝑖𝑖 − 𝜇𝜇)2 .
𝑁𝑁
𝑖𝑖=1
𝑁𝑁
2
1
𝑆𝑆 = �(𝑦𝑦𝑖𝑖 − 𝜇𝜇)2 .
𝑁𝑁 − 1
𝑖𝑖=1
That is,
𝑁𝑁
𝑆𝑆 2 = 𝜎𝜎 2 .
𝑁𝑁 − 1
• For large population, the difference between 𝜎𝜎 2 and 𝑆𝑆 2 is ignorable.
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂:
𝑌𝑌1 , 𝑌𝑌2 , ⋯ , 𝑌𝑌𝑗𝑗 , ⋯ , 𝑌𝑌𝑛𝑛
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀:
𝑛𝑛
1
𝑌𝑌� = � 𝑌𝑌𝑗𝑗 .
𝑛𝑛
𝑗𝑗=1
2
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉:
𝑛𝑛
1 2
𝑠𝑠 2 = ��𝑌𝑌𝑗𝑗 − 𝑌𝑌�� .
𝑛𝑛 − 1
𝑗𝑗=1
Remark
The given sample observations is usually denoted by
𝑦𝑦1 , 𝑦𝑦2 , ⋯ , 𝑦𝑦𝑗𝑗 , ⋯ , 𝑦𝑦𝑛𝑛 .
Note
1. Estimator: Estimator is a function of sample observations with no unknown
quantity (i.e. a statistic), which is used to find an approximate value for the
unknown parameter. Note that estimator is a random variable.
Suppose that the population mean of the random variable of interest 𝑌𝑌 is unknown.
That is, the population mean 𝜇𝜇 is an unknown parameter.
• The sample mean value obtained from a given sample observations is the
estimate of unknown population mean. That is, sample mean value
approximates the unknown population mean.
1
• 𝑌𝑌� = ∑𝑛𝑛𝑗𝑗=1 𝑌𝑌𝑗𝑗 𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝜇𝜇.
𝑛𝑛
1
• 𝑦𝑦� = ∑𝑛𝑛𝑗𝑗=1 𝑦𝑦𝑗𝑗 𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝜇𝜇.
𝑛𝑛
� )?
What is 𝑬𝑬(𝒀𝒀
The expected value of sample mean or 𝐸𝐸(𝑌𝑌�) indicates the arithmetic mean of
population observations obtained on 𝑌𝑌�. It can be computed as follows.
where
𝑁𝑁 𝑖𝑖𝑖𝑖 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
𝐾𝐾 = � 𝐶𝐶𝑛𝑛𝑛𝑛 .
𝑁𝑁 𝑖𝑖𝑖𝑖 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
This expectation can also be computed by defining the probability distribution
function of 𝑌𝑌� (known as sampling distribution of 𝑌𝑌�). Let there be 𝑘𝑘 distinct values
for 𝑌𝑌� and 𝑝𝑝𝑢𝑢 be the probability that the sample mean value is 𝑦𝑦�𝑢𝑢 , 𝑢𝑢 = 1, ⋯ , 𝑘𝑘.
Therefore,
𝑘𝑘
Property
Proof:
Suppose that a simple random sample of size 𝑛𝑛 is drawn from a population having
𝑁𝑁 sampling units. Suppose that the given population observations are
𝑦𝑦1 , ⋯ , 𝑦𝑦𝑖𝑖 , ⋯ , 𝑦𝑦𝑁𝑁
and sample observations are
𝑌𝑌1 , ⋯ , 𝑌𝑌𝑗𝑗 , ⋯ , 𝑌𝑌𝑛𝑛 .
Let 𝜇𝜇 be the unknown population mean defined as
𝑁𝑁
1
𝜇𝜇 = � 𝑦𝑦𝑖𝑖
𝑁𝑁
𝑖𝑖=1
5
and the sample mean be 𝑌𝑌� defined as
𝑛𝑛
1
𝑌𝑌� = � 𝑌𝑌𝑗𝑗 .
𝑛𝑛
𝑗𝑗=1
where
𝑁𝑁 𝑁𝑁 𝑁𝑁
1 1
𝐸𝐸�𝑌𝑌𝑗𝑗 � = � 𝑦𝑦𝑖𝑖 𝑃𝑃(𝑦𝑦𝑖𝑖 ) = � 𝑦𝑦𝑖𝑖 = � 𝑦𝑦𝑖𝑖 = 𝜇𝜇.
𝑁𝑁 𝑁𝑁
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖=1
Therefore,
𝑛𝑛
1 𝑛𝑛𝑛𝑛
𝐸𝐸(𝑌𝑌�) = � 𝜇𝜇 = = 𝜇𝜇.
𝑛𝑛 𝑛𝑛
𝑗𝑗=1
Illustration
6
iv. Find the sampling distribution of sample mean. Hence show that it
is an unbiased estimator of population mean.
Solution
1
(iii) 𝐸𝐸(𝑌𝑌�) = (3 + 4 + 5 + 5 + 6 + 7) = 5 (= 𝜇𝜇)
6
(iv) The possible values of sample mean are 𝑦𝑦� = 3, 4, 5, 6, 7. The sampling
distribution of sample mean is
𝑦𝑦� 3 4 5 6 7
𝑃𝑃(𝑦𝑦�) 1/6 1/6 2/6 1/6 1/6
7
6
1 1 2 1 1 30
=3× +4× +5× +6× +7× = = 5(= 𝜇𝜇)
6 6 6 6 6 6
Therefore, sample mean (𝑌𝑌�) is an unbiased estimator of population mean.
Exercise
Property
The population variance of the sample mean obtained in simple random sampling
without replacement is given by
𝑆𝑆 2
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�) = (1 − 𝑓𝑓) ,
𝑛𝑛
where
𝑁𝑁
1 𝑁𝑁
𝑆𝑆 2 = �(𝑦𝑦𝑖𝑖 − 𝜇𝜇)2 = 𝜎𝜎 2 .
𝑁𝑁 − 1 𝑁𝑁 − 1
𝑖𝑖=1
Proof:
Suppose that a simple random sample of size 𝑛𝑛 is drawn from a population having
𝑁𝑁 sampling units without replacement. Suppose that the given population
observations are
𝑦𝑦1 , ⋯ , 𝑦𝑦𝑖𝑖 , ⋯ , 𝑦𝑦𝑁𝑁
and sample observations are
𝑌𝑌1 , ⋯ , 𝑌𝑌𝑗𝑗 , ⋯ , 𝑌𝑌𝑛𝑛 .
Let 𝜇𝜇 be the unknown population mean defined as
8
𝑁𝑁
1
𝜇𝜇 = � 𝑦𝑦𝑖𝑖
𝑁𝑁
𝑖𝑖=1
𝑛𝑛 𝑛𝑛
1
= 2 �� 𝑉𝑉𝑉𝑉𝑉𝑉�𝑌𝑌𝑗𝑗 � + � 𝐶𝐶𝐶𝐶𝐶𝐶�𝑌𝑌𝑖𝑖 , 𝑌𝑌𝑗𝑗 ��
𝑛𝑛
𝑗𝑗=1 𝑖𝑖≠𝑗𝑗
𝑛𝑛 𝑛𝑛
1
= 2 �� 𝜎𝜎 2 + � 𝐶𝐶𝐶𝐶𝐶𝐶�𝑌𝑌𝑖𝑖 , 𝑌𝑌𝑗𝑗 ��
𝑛𝑛
𝑗𝑗=1 𝑖𝑖≠𝑗𝑗
Now,
2
Since �∑𝑘𝑘𝑖𝑖=1 𝑥𝑥𝑖𝑖 � = ∑𝑘𝑘𝑖𝑖=1 𝑥𝑥𝑖𝑖2 + ∑𝑘𝑘𝑖𝑖≠𝑗𝑗 𝑥𝑥𝑖𝑖 𝑥𝑥𝑗𝑗 ,
𝑁𝑁 2 𝑁𝑁 𝑁𝑁
2
��(𝑦𝑦𝑖𝑖 − 𝜇𝜇)� = �(𝑦𝑦𝑖𝑖 − 𝜇𝜇) + �(𝑦𝑦𝑖𝑖 − 𝜇𝜇)�𝑦𝑦𝑗𝑗 − 𝜇𝜇�.
𝑖𝑖=1 𝑖𝑖=1 𝑖𝑖≠𝑗𝑗
9
It is known that the sum of deviations of observations from the mean is always zero.
Therefore,
𝑁𝑁 𝑁𝑁
Finally,
𝑁𝑁
1 2
𝜎𝜎 2
𝐶𝐶𝐶𝐶𝐶𝐶�𝑌𝑌𝑖𝑖 , 𝑌𝑌𝑗𝑗 � = − �(𝑦𝑦𝑖𝑖 − 𝜇𝜇) = − .
𝑁𝑁(𝑁𝑁 − 1) 𝑁𝑁 − 1
𝑖𝑖=1
Therefore,
1 𝜎𝜎 2
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�) = 2
�∑𝑛𝑛𝑗𝑗=1 𝜎𝜎 2 + ∑𝑛𝑛𝑖𝑖≠𝑗𝑗 − �
𝑛𝑛 𝑁𝑁−1
𝑛𝑛𝜎𝜎 2 𝑛𝑛(𝑛𝑛 − 1) 2 𝜎𝜎 2 𝑛𝑛 − 1
= 2 − 2 𝜎𝜎 = �1 − �
𝑛𝑛 𝑛𝑛 (𝑁𝑁 − 1) 𝑛𝑛 𝑁𝑁 − 1
𝜎𝜎 2 𝑁𝑁 − 𝑛𝑛
= ×
𝑛𝑛 𝑁𝑁 − 1
(𝑁𝑁 − 1)𝑆𝑆 2 1 𝑁𝑁 − 𝑛𝑛
= × ×
𝑁𝑁 𝑛𝑛 𝑁𝑁 − 1
𝑁𝑁 − 𝑛𝑛 𝑆𝑆 2 𝑆𝑆 2
= × = (1 − 𝑓𝑓)
𝑁𝑁 𝑛𝑛 𝑛𝑛
𝑆𝑆 2
𝑖𝑖. 𝑒𝑒. 𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�) = (1 − 𝑓𝑓) .
𝑛𝑛
Illustration
10
The population mean
1
𝜇𝜇 = (2 + 4 + 6 + 8) = 5.
4
Then,
1
𝑆𝑆 2 = [(2 − 5)2 + (4 − 5)2 + (6 − 5)2 + (8 − 5)2 ] = 6.67
4−1
The sampling fraction
𝑛𝑛 2
𝑓𝑓 = = = 0.5
𝑁𝑁 4
Therefore,
𝑆𝑆 2 6.67
(1 − 𝑓𝑓) = (1 − 0.5) × = 1.67
𝑛𝑛 2
4!
The possible number of samples=4𝐶𝐶2 = = 6.
2! 2!
That is, the values of sample mean are 3, 4, 5, 5, 6, 7. The population mean of
sample mean is
1
(3 + 4 + 5 + 5 + 6 + 7) = 5.
6
Then, the population variance of sample mean is
11
1
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�) = [(3 − 5)2 + (4 − 5)2 + (5 − 5)2 + (5 − 5)2 + (6 − 5)2 + (7 − 5)2 ]
6
= 1.67
Hence
𝑆𝑆 2
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�) = (1 − 𝑓𝑓) .
𝑛𝑛
Property
The population variance of the sample mean obtained in simple random sampling
with replacement is given by
𝜎𝜎 2
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�) = ,
𝑛𝑛
where
𝑁𝑁
1 𝑁𝑁 − 1 2
𝜎𝜎 2 = �(𝑦𝑦𝑖𝑖 − 𝜇𝜇)2 = 𝑆𝑆 .
𝑁𝑁 𝑁𝑁
𝑖𝑖=1
Proof:
Suppose that a simple random sample of size 𝑛𝑛 is drawn from a population having
𝑁𝑁 sampling units without replacement. Suppose that the given population
observations are
𝑦𝑦1 , ⋯ , 𝑦𝑦𝑖𝑖 , ⋯ , 𝑦𝑦𝑁𝑁
and sample observations are
𝑌𝑌1 , ⋯ , 𝑌𝑌𝑗𝑗 , ⋯ , 𝑌𝑌𝑛𝑛 .
Let 𝜇𝜇 be the unknown population mean defined as
𝑁𝑁
1
𝜇𝜇 = � 𝑦𝑦𝑖𝑖
𝑁𝑁
𝑖𝑖=1
12
𝑛𝑛
1
𝑌𝑌� = � 𝑌𝑌𝑗𝑗 .
𝑛𝑛
𝑗𝑗=1
𝑛𝑛 𝑛𝑛
1
= 2 �� 𝑉𝑉𝑉𝑉𝑉𝑉�𝑌𝑌𝑗𝑗 � + � 𝐶𝐶𝐶𝐶𝐶𝐶�𝑌𝑌𝑖𝑖 , 𝑌𝑌𝑗𝑗 ��
𝑛𝑛
𝑗𝑗=1 𝑖𝑖≠𝑗𝑗
𝑛𝑛 𝑛𝑛
1
= 2 �� 𝜎𝜎 2 + � 𝐶𝐶𝐶𝐶𝐶𝐶�𝑌𝑌𝑖𝑖 , 𝑌𝑌𝑗𝑗 ��
𝑛𝑛
𝑗𝑗=1 𝑖𝑖≠𝑗𝑗
In simple random sampling with replacement, the selection of sampling units are
independent of each other. Hence, observations obtained from sampling units are
independent among themselves. Therefore,
𝐶𝐶𝐶𝐶𝐶𝐶�𝑌𝑌𝑖𝑖 , 𝑌𝑌𝑗𝑗 � = 0, ∀ 𝑖𝑖 ≠ 𝑗𝑗.
Finally,
𝑛𝑛𝜎𝜎 2 𝜎𝜎 2
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�) = 2 = .
𝑛𝑛 𝑛𝑛
In other words,
𝑁𝑁 − 1 𝑆𝑆 2
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�) = × .
𝑁𝑁 𝑛𝑛
13
Exercise
Suppose that a SRS of size 2 is drawn from a population of size 4 with replacement.
Also, suppose that values obtained from sampling units 1, 2,3, 𝑎𝑎𝑎𝑎𝑎𝑎 4 are 2, 4, 6, and
8, respectively. Justify that population variance of sample mean 𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�) satisfies
𝜎𝜎 2
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�) = .
𝑛𝑛
Efficient Estimator:
Suppose that there are two estimators for the same unknown parameter 𝜃𝜃. Let 𝜃𝜃�1
and 𝜃𝜃�2 be the two competitive unbiased estimators of 𝜃𝜃. The estimator 𝜃𝜃�1 is said to
be efficient estimator of 𝜃𝜃, if the population variance of 𝜃𝜃�1 is smaller than the
population variance of 𝜃𝜃�2 , 𝑖𝑖. 𝑒𝑒.
𝑉𝑉𝑉𝑉𝑉𝑉�𝜃𝜃�1 � < 𝑉𝑉𝑉𝑉𝑉𝑉�𝜃𝜃�2 �.
Theorem
SRSWOR provides efficient estimator of population mean compared to SRSWR.
Proof:
Suppose that a simple random sample of size 𝑛𝑛 is drawn from a population having
𝑁𝑁 sampling units. Suppose that the given population observations are
𝑦𝑦1 , ⋯ , 𝑦𝑦𝑖𝑖 , ⋯ , 𝑦𝑦𝑁𝑁
and sample observations are
𝑌𝑌1 , ⋯ , 𝑌𝑌𝑗𝑗 , ⋯ , 𝑌𝑌𝑛𝑛 .
Let 𝑌𝑌�𝑊𝑊𝑊𝑊𝑊𝑊 be the sample mean when SRS is done without replacement and 𝑌𝑌�𝑊𝑊𝑊𝑊 be
the sample mean when SRS is done with replacement. Note that both 𝑌𝑌�𝑊𝑊𝑊𝑊𝑊𝑊 and 𝑌𝑌�𝑊𝑊𝑊𝑊
14
are unbiased estimator of population mean 𝜇𝜇. The population variance of 𝑌𝑌�𝑊𝑊𝑊𝑊𝑊𝑊 is
given by
𝑆𝑆 2
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�𝑊𝑊𝑊𝑊𝑊𝑊 ) = (1 − 𝑓𝑓)
𝑛𝑛
and the population variance of 𝑌𝑌�𝑊𝑊𝑊𝑊 is given by
𝑁𝑁 − 1 𝑆𝑆 2
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�𝑊𝑊𝑊𝑊 ) = .
𝑁𝑁 𝑛𝑛
Now,
𝑁𝑁 − 1 𝑆𝑆 2 𝑆𝑆 2
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�𝑊𝑊𝑊𝑊 ) − 𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�𝑊𝑊𝑊𝑊𝑊𝑊 ) = − (1 − 𝑓𝑓)
𝑁𝑁 𝑛𝑛 𝑛𝑛
𝑆𝑆 2 𝑁𝑁 − 1
= � − 1 + 𝑓𝑓�
𝑛𝑛 𝑁𝑁
𝑆𝑆 2 𝑁𝑁 − 1 𝑛𝑛 𝑆𝑆 2 𝑁𝑁 − 1 − 𝑁𝑁 + 𝑛𝑛
= � −1+ �= � �
𝑛𝑛 𝑁𝑁 𝑁𝑁 𝑛𝑛 𝑁𝑁
𝑆𝑆 2 𝑛𝑛 − 1 (𝑛𝑛 − 1)𝑆𝑆 2
= = > 0.
𝑛𝑛 𝑁𝑁 𝑁𝑁𝑁𝑁
It implies that
𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�𝑊𝑊𝑊𝑊 ) > 𝑉𝑉𝑉𝑉𝑉𝑉(𝑌𝑌�𝑊𝑊𝑊𝑊𝑊𝑊 ).
Since variance of sample mean obtained from SRSWOR is less than that of sample
mean obtained from SRSWR, SRSWOR provides efficient estimator of population
mean compared to SRSWR.
Exercise
Suppose that a SRS of size 2 is drawn from a population of size 4. Also, suppose
that values obtained from sampling units 1, 2,3, 𝑎𝑎𝑎𝑎𝑎𝑎 4 are 2, 4, 6, and 8, respectively.
Using this information, show that SRSWOR provides efficient estimator of
population mean compared to SRSWR.
15