CH 5 Sampling and Sampling Distributions
CH 5 Sampling and Sampling Distributions
CHAPTER – FIVE
1. INTRODUCTION TO SAMPLING
In collecting information from the population, there are two alternative approaches: Census and Sampling.
Census refers to the complete enumeration of every element of the population.
Sampling refers to the process of taking only a part of the population and using the information
gained from the part to make judgment about the population.
Sampling is a process what we all are essentially involved in doing in our day-to-day activity consciously or
unconsciously. For instance, a housewife makes judgment about the quality of wheat and finally buys for
domestic use by examining only a handful of wheat grains taken from the lot offered for sale. Likewise, a
production manager does the same when he closely examines only a few items of raw materials to be sure of
its quality before placing the bulk purchase order. The ultimate purpose of any sampling process is to make
judgment about the totality or the whole by examining the part. The totality about which judgment is to be
made is known as population. Part of the totality, which is used to make the judgment, is sample. Simply
sampling consists of selecting a sample and using the sample data to gain knowledge about a given population
parameter. Hence, in collecting information from the population, investigators need to choose either of these
two alternatives based on their merits. In this regard, as it includes every element of the population, census
approach is by far more reliable than that of sampling as far as the validity and accuracy of the data are
concerned. However, in practice investigators prefer sampling to enumerating the entire population on the
following grounds:
I. Feasibility aspect:
In some circumstances, it may not be possible to include the whole elements of the population under the
investigation. This holds true in the case of infinite populations (e.g. products under a continuous production
system) or those populations that are otherwise on constant movement or very large e.g. population of
insects, fish, birds and the like. Generally, in such populations, instead of going into unwieldy exercise of
complete enumeration, the same objective can be achieved by using samples.
II. Destructive aspect:
In some cases, the investigation may involve some kind of tests that may result in the damage of items
included. In such cases, using census would not only be costly but also self-defeating. For e.g. suppose that,
the quality control manager of Kaliti is contemplating of conducting a test on the factory's products. So as to
determine this, products will be exposed to increased heat until they start melting. In this regard, including
Page 1 of 14
Statistics for Economics Notes on Sampling Distribution
every element of the products will ultimately result in the damage of the whole products. Hence, the only
option for the manager to do so is sampling.
III. Efficiency aspect:
In some cases, information might be needed with in a prescribed time limit. In such cases, to contact the
whole population through census would be time consuming that it may lead to the delay of the needed
information. Hence, to deliver the required information in time, sampling is preferable.
IV. Cost aspect:
Obviously as it accommodates each elements of the population, census is often more costly than that of
sampling. In such circumstances, the cost of census will often be far exceeded from the budgetary limit that
it prohibits complete enumeration and in some cases the cost of studying the population may be far greater
than the value that the research will have.
V. Manageability:-
All survey results, whether based on sample or census, are subject to some kind of error. These errors might
arise from poor planning, inefficient execution, and lack of desired control and coordination over the survey
staff. Owing to the outsized number of elements included, census results are more vulnerable to these
errors more than sample survey results.
2. SAMPLING TECHNIQUES
Sampling technique refers to the procedures to be followed in selecting the sample cases (population
elements to be included in the sample) among the whole population elements. In sampling, the resultant
sample cases should be capable enough to represent the population from which they are drawn. This happens
if the sample size is so large that it allows most characteristics of the population to be accommodated by the
elements of the sample. This means, the larger the sample size is, the more it represents the heterogeneity of
the population. Apart from this, the method employed in selecting the sample elements has greater
implication in determining the preciseness, accuracy and efficiency of the resultant samples to represent the
population.
Generally speaking, sampling techniques can be of two kinds: probability (random) and non-probability (non-
random) sampling technique.
a. Probability (Random) Sampling Technique is a sampling technique that provides every element of the
population with a known non-zero chance of being included in the sample. Such method don not provide
any discretion to the investigator in deciding which particular element of the population should be
Page 2 of 14
Statistics for Economics Notes on Sampling Distribution
selected in the sample. Rather chance govern which particular element of the population will be included
in the sample. Thus, we refer a given selecting method as probability sampling method if it guarantees a
known, non-zero chance of selection for each and every element of the population.
b. Non-Probability Sampling Technique is a selection method is said to be non-probability if subjective
considerations such as personal judgment, experience, convenience etc, rather than chance, determines
which population elements are selected for a sample e.g. quota sampling, convenient sampling etc. As this
method allows investigators to determine which population element should be included in the sample, it
introduces bias. In addition, as it is subjective from individual to individual or from situation to situation, it
does not permit the estimation of sampling errors. Contrary to this, probability-sampling techniques do
not introduce any bias in the selection process as the selection process is governed by chance. This
eventually provides us with objective method to determine how precise the results are (estimation of
sampling errors).
A sampling error is the difference between a sample statistic and its corresponding parameter.
Owing to the aforementioned aspects, statisticians use probability-sampling technique than non-probability
sampling. There are four kinds of probability sampling techniques.
Page 3 of 14
Statistics for Economics Notes on Sampling Distribution
iv. Cluster sampling: - random sampling technique in which the population will be divided into
two or more overlapping groups (clusters) and then some of these clusters will be selected
and then sample elements will be selected from the selected clusters based on simple
random or other random sampling technique.
3. SAMPLING DISTRIBUTION OF THE SAMPLE MEAN
Consider population with a mean of µ and standard deviation of . Then if we draw samples of size
n from the total population (N), the number of possible sample kinds of n size that can be drawn
from this population is given by . Hence, the sample means that we can have are different
according to the elements that comprise our sample. For e.g. consider drawing two sample cases
from numbers 1, 2, 3, 4 and 5. The total number of sample kind that we can draw from these
numbers is 5C2=10. That means our sample may be one of the following possible sample kinds: -
We can observe the following points about the sampling distribution of the sample mean and the
distribution of the population:
a. The population mean (µ) is equal to the mean of the sample means (µx), i.e.,
( ) ( ) ( )
Thus,
b. The spread in the distribution of the sample mean is less than the spread in the population
values.
The sample mean ranges from 1.5 to 4.5, while the population values vary from 1 to 5.
Notice that as we increase the size of the sample, the spread of the distribution of the
sample mean becomes smaller.
Page 4 of 14
Statistics for Economics Notes on Sampling Distribution
c. The shape of the sampling distribution of the sample mean and the shape of the frequency
distribution of the population values are different.
Population Distribution Distribution of the sample mean
0.2 0.2
Probability
Probability
0.1
A simple example of the central limit theorem is rolling a large number of identical and unbiased dice.
The distribution of the sum (or average) of the rolled numbers will be well approximated by a normal
distribution. The following figures show how the approximation to normal distribution improves as we
increase the sample size.
Page 5 of 14
Statistics for Economics Notes on Sampling Distribution
If the population follows a normal probability distribution, then for any sample size the sampling
distribution of the sample mean will also be normal. But, if the population distribution is symmetrical
but not normal, the normal distribution arises with samples as small as 10. If the population is skewed
or has thick tails, it may require samples of 30 or more to observe the normality feature
Page 6 of 14
Statistics for Economics Notes on Sampling Distribution
5. PARAMETRIC ESTIMATION
5.1 ESTIMATION
We know that sampling enables us to estimate population parameters from sample statistics. Our
estimates about a population parameter can be of two kinds:
I. Point Estimates
II. Confidence Intervals
5.1.1 A Point estimate:
A Point Estimate is one value (a point) computed from sample information that is used to estimate a
population parameter. Examples of point estimates are the sample mean, the sample standard
deviation, the sample variance, the sample proportion etc...
EXAMPLE: The number of defective items produced by a machine was recorded for five randomly
selected hours during a 40-hour workweek. The observed numbers of defectives were 12, 4, 7, 14, and
10. So, the sample mean is 9.4. Thus, a point estimate for the hourly mean number of defectives is 9.4.
A point estimate, however tells us only part of the story. While we expect the point estimate to be close
to the population parameter, we would like to measure how close it really is. A confidence interval or
interval estimates serves this purpose.
An Interval Estimate states the range within which a population parameter probably lies. The interval
within which a population parameter is expected to occur is called a confidence interval. To compute
confidence intervals, we will consider two situations:
a. When the population standard deviation, , is known, and
b. When the population standard deviation is unknown or when is unknown
There are important distinctions in the assumptions between these two situations.
a. CI When is known
The confidence interval is computed using two statistics: the sample mean and the sample standard
deviation , which is used to compute the range of the confidence interval. From the CLT we know that
the sampling distribution of the sample mean follows the normal probability distribution with a mean of
x and standard deviation /n for n > 30. The results of the CLT allow us to make the following
confidence interval statements using Z-statistics;
Page 7 of 14
Statistics for Economics Notes on Sampling Distribution
95% of the sample means selected from a population will be within 1.96 standard errors
of the population mean
99% of the sample means will lie within 2.58 standard errors of the population mean
The values 1.96 and 2.58 are the Z values corresponding to the middle 95% and 99% of the observations
respectively.
Always remember the Empirical or Normal Rule: that for a symmetrical, bell-shaped distribution:
Approximately 68% of the observations will lie within plus or minus 1 standard deviation of the
mean
95% of the observations will lie within plus or minus 2 standard deviations of the mean
Practically all (99.7%) of the observations will lie within plus or minus 3 standard deviations of the
mean
The above confidence interval statements provide examples of levels of confidence, and are called 95%
confidence interval and 99% confidence interval. The 95% and the 99% are the levels of confidence and
refer to the percentage of similarly constructed intervals that would include the parameter being
estimated. The value 1.96 is obtained as follows: 95% is divided into two each having a probability of
47.5% (0.475). Locate 0.475 in the Z-table. Read the corresponding row and column values. The value is
1.96. Thus, the probability of finding Z value between 0 and 1.96 is 0.475. Likewise, the probability of
finding Z value between 0 and -1.96 is .475. When we combine these two, i.e., the probability of being in
the interval -1.96 and 1.96 is 0.95.
Page 8 of 14
Statistics for Economics Notes on Sampling Distribution
The width of the interval is determined by the level of confidence and the size of the standard error of
the mean /n. The Standard error is affected by the standard deviation of the population () and the
sample size (n). The larger th*e population standard deviation , the larger the standard error /n.
The larger the sample size, the smaller the standard error indicating less variability in the sample means.
The 95% Confidence Interval when n > 30 and is known is determined as follows:
̅ ( )
√
The 99% confidence intervals for are constructed as follows:
̅
( )
√
The 95% and 99% CIs are most widely used but we are not restricted to them. We can select any
confidence level between 0 and 100 and find the corresponding value for Z. In general, a confidence
interval for the mean is computed by:
̅ ( )
√
Where, Z is the Z-value for the desired confidence level.
Example: The Dean of the Business School wants to estimate the mean number of hours worked per
week by students. A sample of 49 students showed a mean of 24 hours with a standard deviation of 4
hours. Compute:
A point estimate for the mean number of hours worked by students of Business School, and
The 95% confidence interval for the average number of hours worked per week by the students.
Solution:
( )
√
The endpoints of the confidence interval are the confidence limits. The lower confidence limit is 22.88
and the upper confidence limit is 25.12. We can interpret the 95% confidence interval as follows:
If we select 100 samples of 49 students from the population (all students of business school), calculated
the sample mean, and developed a CI based on each sample mean, we would expect to find the
population mean in about 95 of the 100 intervals.
Page 9 of 14
Statistics for Economics Notes on Sampling Distribution
When the population standard deviation is known and n > 30, we can use the Z distribution to
construct a CI for the mean. But, when is unknown or n < 30, we cannot use the Z-distribution to
construct a CI for the population mean. Instead we use the sample standard deviation, s, to estimate
and replace the Z distribution with the student’s t-distribution.
The t-distribution is a continuous probability distribution with many similar characteristics with Z-
distribution. It was first studied by William Gosset, an English employee of Guinness bee company. He
published his work under the pseudonym ‘Student’, since the company did not allow its employees to
publish under their own names, so the distribution is sometimes also known as the Student distribution.
He was particularly concerned with the exact behavior of the distribution of the following statistic:
̅
√
Where s is an estimate of .
William Gosset was especially concerned about the discrepancy between s and when s was calculated
from a very small sample. The t distribution is in many ways similar to the standard Normal, insofar as it
is Unimodal, symmetric, Centered on zero, Bell-shaped and extends from minus infinity to plus infinity.
The t-distribution is more spread out (has a larger variance) than the standard Normal
distribution
The t-distribution has only one parameter, degrees of freedom, while the normal distribution
has two (,)
When using sample statistics, the degrees of freedom, denoted by the Greek letter ν (pronounced ‘nu’),
captures the number of values that are free to vary. In problems involving the estimation of a sample
mean the degrees of freedom are given by the sample size minus one, i.e. ν = n − 1.The probability for t-
values are obtained from tables for t-distribution.
The confidence interval of the population mean when is unknown, or when n < 30, is constructed by
the following equation:
̅ ( )
√
In constructing the CI, we consider the following points:
Page 10 of 14
Statistics for Economics Notes on Sampling Distribution
When dealing with small samples (n < 30), the CLT does not guarantee that the sampling
distribution of is normal or approximately normal unless the population is normal. As a result,
we must assume that the population is normal or approximately normal if our CI estimates of
are to be valid.
Unless we know , we cannot use the standard normal z statistic to form our confidence
interval for .Instead, we must use the t-distribution, with (n - 1) degrees of freedom.
Example: A sample of 15 light bulbs, produced by a light bulb company, showed an average specific
lifetime of 1035.6, hours with standard deviation 2.7. Construct a 95% CI for the true (population) life
time of the light bulbs.
Page 11 of 14
Statistics for Economics Notes on Sampling Distribution
A set of criteria is needed for discriminating between good and bad estimators. Which of the above
three estimators is ‘best’? Two important criteria by which to judge estimators are bias and precision.
a. Bias
It is impossible to know if a single estimate of a parameter, derived by applying a particular estimator to
the sample data, gives a correct estimate of the parameter or not. The estimate might be too low or too
high and, since the parameter is unknown, it is impossible to check this. What is possible, however, is to
say whether an estimator gives the correct answer on average. An estimator which gives the correct
answer on average is said to be unbiased. Another way of expressing this is to say that an unbiased
estimator does not systematically mislead the researcher away from the correct value of the parameter.
It is important to remember, though, that even using an unbiased estimator does not guarantee that a
single use of the estimator will yield a correct estimate of the parameter. Bias (or the lack of it) is a
theoretical property.
Formally, an estimator is unbiased if its expected value is equal to the parameter being estimated.
Consider trying to estimate the population mean using the three estimators suggested above. Taking the
sample mean first, we have already learned that its expected value is μ, i.e.,
( ̅)
The second estimator (the smallest observation in the sample) can easily be shown to be biased, using
the result derived above. Since the smallest sample observation must be less than the sample mean, its
expected value must be less than μ. Denote the smallest observation by Xs, then
Page 12 of 14
Statistics for Economics Notes on Sampling Distribution
( )
So, this estimator is biased downwards. It underestimates the population mean. The size of the bias is
simply the difference between the expected value of the estimator and the value of the parameter, so
the bias in this case is:
( )
For the sample mean X the bias is obviously zero.
Turning to the third rule (the first sample observation) this can be shown to be another unbiased
estimator. Choosing the first observation from the sample is equivalent to taking a random sample of
size one from the population in the first place. Thus, the single observation may be considered as the
sample mean from a random sample of size one. Since it is a sample mean it is unbiased, as
demonstrated earlier.
b. Precision
Two of the estimators above were found to be unbiased, and, in fact, there are many unbiased
estimators (the sample median is another). Some way of choosing between the set of all unbiased
estimators is therefore required, which is where the criterion of precision helps. Unlike bias, precision is
a relative concept, comparing one estimator to another. Given two estimators A and B, A is more precise
than B if the estimates it yields (from all possible samples) are less spread out than those of estimator B.
A precise estimator will tend to give similar estimates for all possible samples.
Consider the two unbiased estimators found above: how do they compare on the criteria of precision? It
turns out that the sample mean is the more precise of the two, and it is not difficult to understand why.
Taking just a single sample observation means that it is quite likely to be unrepresentative of the
population as a whole, and thus leads to a poor estimate of the population mean. The sample mean on
the other hand is based on all the sample observations and it is unlikely that all of them are
unrepresentative of the population. The sample mean is therefore a good estimator of the population
mean, being more precise than the single observation estimator.
Just as bias was related to the expected value of the estimator, so precision can be defined in terms of
the variance. One estimator is more precise than another if it has a smaller variance. Recall that the
probability distribution of the sample mean is ̅ ( ) in large samples, so the variance of the
Page 13 of 14
Statistics for Economics Notes on Sampling Distribution
Page 14 of 14