Chap 8
Chap 8
This chapter is the beginning of our study of sampling. Sampling is necessary because we
want to make statements about a population but we do not want to (or cannot) examine
all the items in that population. Recall from Chapter 1 that a population refers to the
entire group of objects or persons of interest. The population of interest might be all the
persons in the city receiving welfare payments or all the computer chips produced during
the last hour. A sample is a portion, a part, or a subset of the population. Fifty welfare
recipients out of 4,000 receiving payments might constitute the sample, or 20 computer
chips might be sampled out of 1,500 produced last hour.
Why is it necessary to sample? Why can't we just inspect all the items? There are several
reasons.
1. To contact the whole population would often be very time consuming. To ask
every eligible voter if they plan to vote for the current senator in the forthcoming
election would take months. The election would probably be over before the
survey was completed.
2. The cost of studying all the items in the population is often prohibitive. Some
television program ratings are established by analyzing the viewing habits of
about 1,200 viewers. The cost of studying all the homes having television would
be exorbitant.
3. The adequacy of sample results. If the sample results of the viewing habits of
1,200 homes revealed that only 1.1 percent of the homes watched 60 Minutes,"
no doubt the program would be replaced by another show. Checking the viewing
habits of all the homes regarding "60 Minutes" probably would not change the
percent significantly.
4. The destructive nature of certain tests. The manufacturer of fuses cannot test all
of them because in the testing the fuse is destroyed and none would be available
for sale.
5. The physical impossibility of checking all the items in the population. The South
Dakota Game Commission, for example, cannot check all the deer, grouse, and
other wild game because they are always moving.
Four types of probability sampling are commonly used: simple random sampling,
systematic random sampling, stratified random sampling, and cluster sampling. The
most widely used type of sampling is a simple random sample.
Simple random sample: A sample selected so that each item or person in the
population has the same chance of being included.
In a systematic random sample the items or individuals of the population are arranged in
some way alphabetically, in a file drawer by date received, or by some other method.
A random starting point is selected, and then every kth member of the population is
selected for the sample. In a systematic random sample, you might take all the items in
the population and number them 1, 2, 3,.... Next, a random starting point is selected, let's
say 39. Every kth item thereafter, such as every 100th, is selected for the sample. This
means that 39, 139, 239, 339, and so on would be a part of the sample.
For example, if our study involved Army personnel, we might decide to stratify the
population (all Army personnel) into generals, other officers, and enlisted personnel. The
number selected from each of the three strata could be proportional to the total number in
the population for the corresponding strata. Each member of the population can belong to
only one of the strata. That is, a military person cannot be a general and a private at the
same time.
Cluster sampling is often used to reduce the cost of sampling when the population is
scattered over a large geographic area. Suppose the objective is to study household waste
collection in a large city.
Step 3: Households within each of these precincts are randomly selected and interviewed.
Sampling "Error"
It is not logical to expect that the results obtained from a sample will coincide exactly
with those from a population. For example, it is unlikely that the mean welfare payment
for a sample of 50 recipients is exactly the same as the mean for all 4,000 welfare
recipients. We expect a difference between a sample statistic and its corresponding
population parameter. The difference is called sampling error.
Because these errors happen by chance, they are referred to as chance variations.
Suppose all possible samples of size n are selected from a specified population, and the
mean of each of these samples is computed. The distribution of these sample means is
called the sampling distribution of the sample mean.
The sampling distribution of the mean is a probability distribution and has the following
major characteristics:
1. The mean of all the sample means will be exactly equal to the population mean.
2. If the population from which the samples are drawn is normal, the distribution of
sample means is also normally distributed.
3. If the population from which the samples are drawn is not normal, the sampling
distribution is approximately normal, provided the samples are "sufficiently" large
(usually accepted to include at least 30 observations).
The central limit theorem states that, for large random samples, the shape of the
sampling distribution of the sample means is close to a normal probability distribution.
The approximation is more accurate for large samples than for small samples. We can
make logical and reasonable statements about the distribution of the sample means with
little or no information about the shape of the original distribution from which we took
the sample.
This phenomenon is called the central limit theorem.
Central limit theorem: If all samples of a specified size are selected from
any population, the sampling distribution of the sample means is
approximately a normal distribution. This approximation improves with
larger samples.
The Central Limit Theorem does not address the dispersion of the sampling distribution
of sample means nor does it address the comparison of the sampling distribution of
sample means to the mean of the population. It can be shown that the mean of the
sampling distribution is the population mean, and if the standard deviation in the
The standard error is a measure of the variability of the sampling distribution of the
means. It is computed using text formula [8-1]
Where:
In most situations we do not know the population standard deviation so we replace it with
the sample standard deviation. We replace with s. Thus we have the following formula:
The size of the standard error is affected by the standard deviation. As the standard
deviation increases so does the standard error. The standard error is also affected by the
sample size. As the sample size increases the standard error decreases, which indicates
that there is less variability in the distribution of the sample means. Obviously we
conclude that as we increase the sample size the standard error decreases.
1. The mean of the distribution of sample means will be exactly equal to the population
mean if we are able to select all possible samples of a particular size from a given
population. That is . Even if we do not select all samples, we can expect the mean
of the distribution of the sample mean to be close to the population mean.
2. There will be less dispersion in the sampling distribution of the distribution of sample
mean than in the population. If the standard deviation of the population is , the standard
deviation of the distribution of sample means is . Note that when we increase the size
of the sample the standard error of the mean decreases.
The majority of statistical business decisions are made on the basis of sampling.
Generally we have a population and wish to know something about that population, such
as the mean. We take a sample from that population and wish to conclude whether the
sampling error, that is the difference between the population parameter and the sample
statistic, is due to chance.
We can compute the probability that a sample mean will fall within a certain range. The
sampling distribution of the sample mean will follow the normal probability distribution
under two conditions:
1. When the samples are taken from populations known to follow the normal
distribution. In this case the size of the sample is not a factor.
2. When the shape of the population distribution is not known or the shape is known
to be nonnormal, but the sample contains at least 30 observations.
Recall that we used the z-value found with Formula [7-1] to convert any normal
distribution to the standard normal distribution. We can use the standard normal table to
find the probability of selecting a value of an observation that falls within a specified
range. The formula is:
Since most business decisions are based on a sample, we are interested in the distribution
of the sample mean not the value of , the value of one observation. Formula [7-1] is
altered to reflect this need. We change to . Then we change the population standard
Thus we have Formula [8-2] that is used to find the z value for a normal population with
a known population mean and standard deviation:
If we do not know the value of the population standard deviation and the sample size is
at least 30, we estimate the population standard deviation with the sample standard
deviation s. Thus we use s to replace , the new formula is Formula [8-3]: