0% found this document useful (0 votes)
38 views

Chap 8

Uploaded by

sastf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Chap 8

Uploaded by

sastf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

Introduction

This chapter is the beginning of our study of sampling. Sampling is necessary because we
want to make statements about a population but we do not want to (or cannot) examine
all the items in that population. Recall from Chapter 1 that a population refers to the
entire group of objects or persons of interest. The population of interest might be all the
persons in the city receiving welfare payments or all the computer chips produced during
the last hour. A sample is a portion, a part, or a subset of the population. Fifty welfare
recipients out of 4,000 receiving payments might constitute the sample, or 20 computer
chips might be sampled out of 1,500 produced last hour.

Reasons for Sampling

Why is it necessary to sample? Why can't we just inspect all the items? There are several
reasons.

1. To contact the whole population would often be very time consuming. To ask
every eligible voter if they plan to vote for the current senator in the forthcoming
election would take months. The election would probably be over before the
survey was completed.
2. The cost of studying all the items in the population is often prohibitive. Some
television program ratings are established by analyzing the viewing habits of
about 1,200 viewers. The cost of studying all the homes having television would
be exorbitant.
3. The adequacy of sample results. If the sample results of the viewing habits of
1,200 homes revealed that only 1.1 percent of the homes watched  60 Minutes,"
no doubt the program would be replaced by another show. Checking the viewing
habits of all the homes regarding "60 Minutes" probably would not change the
percent significantly.
4. The destructive nature of certain tests. The manufacturer of fuses cannot test all
of them because in the testing the fuse is destroyed and none would be available
for sale.
5. The physical impossibility of checking all the items in the population. The South
Dakota Game Commission, for example, cannot check all the deer, grouse, and
other wild game because they are always moving.

Probability Sampling Methods

Four types of probability sampling are commonly used: simple random sampling,
systematic random sampling, stratified random sampling, and cluster sampling. The
most widely used type of sampling is a simple random sample.

Simple random sample: A sample selected so that each item or person in the
population has the same chance of being included.

Several ways of selecting a simple random sample are:


1. The name or identifying number of each item in the population is recorded on a
slip of paper and placed in a box. The slips of paper are shuffled and the required
sample size is chosen from the box.
2. Each item is numbered and a table of random numbers, such as the one in
Appendix E, is used to select the members of the sample.
3. There are many software programs, such as MINITAB and Excel, which have
routines that will randomly select a given number of items from the population.

Another type of sampling is a systematic random sample.

Systematic random sample: A random starting point is selected and then


every kth member of the population is selected.

In a systematic random sample the items or individuals of the population are arranged in
some way  alphabetically, in a file drawer by date received, or by some other method.
A random starting point is selected, and then every kth member of the population is
selected for the sample. In a systematic random sample, you might take all the items in
the population and number them 1, 2, 3,.... Next, a random starting point is selected, let's
say 39. Every kth item thereafter, such as every 100th, is selected for the sample. This
means that 39, 139, 239, 339, and so on would be a part of the sample.

Another type of probability sample is referred to as stratified random sampling.

Stratified random sample: A population is divided into subgroups, called


strata, and a sample is randomly selected from each stratum.

For example, if our study involved Army personnel, we might decide to stratify the
population (all Army personnel) into generals, other officers, and enlisted personnel. The
number selected from each of the three strata could be proportional to the total number in
the population for the corresponding strata. Each member of the population can belong to
only one of the strata. That is, a military person cannot be a general and a private at the
same time.

Another common type of sampling is cluster sampling.

Cluster sampling: A population is divided into clusters using naturally


occurring geographic or other boundaries. Clusters are then randomly
selected and a sample is collected by randomly selecting from each cluster.

Cluster sampling is often used to reduce the cost of sampling when the population is
scattered over a large geographic area. Suppose the objective is to study household waste
collection in a large city.

Step 1: Divide the city into smaller units (perhaps precincts).


Step 2: The precincts are numbered and several selected randomly.

Step 3: Households within each of these precincts are randomly selected and interviewed.

Sampling "Error"

It is not logical to expect that the results obtained from a sample will coincide exactly
with those from a population. For example, it is unlikely that the mean welfare payment
for a sample of 50 recipients is exactly the same as the mean for all 4,000 welfare
recipients. We expect a difference between a sample statistic and its corresponding
population parameter. The difference is called sampling error.

Sampling error: The difference between a sample statistic and its


corresponding population parameter.

Because these errors happen by chance, they are referred to as chance variations.

Sampling Distribution of the Sample Mean

Suppose all possible samples of size n are selected from a specified population, and the
mean of each of these samples is computed. The distribution of these sample means is
called the sampling distribution of the sample mean.

Sampling distribution of the sample mean: A probability distribution of all


possible sample means of a given sample size.

The sampling distribution of the mean is a probability distribution and has the following
major characteristics:

1. The mean of all the sample means will be exactly equal to the population mean.
2. If the population from which the samples are drawn is normal, the distribution of
sample means is also normally distributed.
3. If the population from which the samples are drawn is not normal, the sampling
distribution is approximately normal, provided the samples are "sufficiently" large
(usually accepted to include at least 30 observations).

The Central Limit Theorem

The central limit theorem states that, for large random samples, the shape of the
sampling distribution of the sample means is close to a normal probability distribution.
The approximation is more accurate for large samples than for small samples. We can
make logical and reasonable statements about the distribution of the sample means with
little or no information about the shape of the original distribution from which we took
the sample.
This phenomenon is called the central limit theorem.

Central limit theorem: If all samples of a specified size are selected from
any population, the sampling distribution of the sample means is
approximately a normal distribution. This approximation improves with
larger samples.

Standard Error of the Mean

The Central Limit Theorem does not address the dispersion of the sampling distribution
of sample means nor does it address the comparison of the sampling distribution of
sample means to the mean of the population. It can be shown that the mean of the
sampling distribution is the population mean, and if the standard deviation in the

population is  , the standard deviation of the means is , where n is the number of

observations in each sample. We refer to as the standard error of the mean. It is


actually the standard deviation of the sampling distribution of the sample mean:

Standard Error of the Mean: The standard deviation of the sampling


distribution of the sample mean.

The standard error is a measure of the variability of the sampling distribution of the
means. It is computed using text formula [8-1]

Standard Error of the Mean

Where:

is the standard error of the mean

 is the population standard deviation

n is the sample size

In most situations we do not know the population standard deviation so we replace it with
the sample standard deviation. We replace  with s. Thus we have the following formula:
The size of the standard error is affected by the standard deviation. As the standard
deviation increases so does the standard error. The standard error is also affected by the
sample size. As the sample size increases the standard error decreases, which indicates
that there is less variability in the distribution of the sample means. Obviously we
conclude that as we increase the sample size the standard error decreases.

It is important to note the following:

1. The mean of the distribution of sample means will be exactly equal to the population
mean if we are able to select all possible samples of a particular size from a given
population. That is . Even if we do not select all samples, we can expect the mean
of the distribution of the sample mean to be close to the population mean.

2. There will be less dispersion in the sampling distribution of the distribution of sample
mean than in the population. If the standard deviation of the population is  , the standard

deviation of the distribution of sample means is . Note that when we increase the size
of the sample the standard error of the mean decreases.

Using the Sampling Distribution of the Sample Mean

The majority of statistical business decisions are made on the basis of sampling.
Generally we have a population and wish to know something about that population, such
as the mean. We take a sample from that population and wish to conclude whether the
sampling error, that is the difference between the population parameter and the sample
statistic, is due to chance.

We can compute the probability that a sample mean will fall within a certain range. The
sampling distribution of the sample mean will follow the normal probability distribution
under two conditions:

1. When the samples are taken from populations known to follow the normal
distribution. In this case the size of the sample is not a factor.
2. When the shape of the population distribution is not known or the shape is known
to be nonnormal, but the sample contains at least 30 observations.

Recall that we used the z-value found with Formula [7-1] to convert any normal
distribution to the standard normal distribution. We can use the standard normal table to
find the probability of selecting a value of an observation that falls within a specified
range. The formula is:

Standard Normal Value


In this formula X is the value of the random variable,  is the population mean, and  is
the population standard deviation.

Since most business decisions are based on a sample, we are interested in the distribution
of the sample mean not the value of , the value of one observation. Formula [7-1] is
altered to reflect this need. We change to . Then we change the population standard

deviation to the standard error of the mean: .

Thus we have Formula [8-2] that is used to find the z value for a normal population with
a known population mean and standard deviation:

Finding the z Value of When the Population Standard


Deviation is Known

If we do not know the value of the population standard deviation  and the sample size is
at least 30, we estimate the population standard deviation with the sample standard
deviation s. Thus we use s to replace  , the new formula is Formula [8-3]:

Finding the z Value of When the Population Standard


Deviation is Unknown

You might also like