0% found this document useful (0 votes)
5 views

Part 2

Uploaded by

Fadia Puan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Part 2

Uploaded by

Fadia Puan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Sampling

Distributions
&
Sampling
Techniques
Pop Quiz

– In a cake factory, the standard deviation of sugar per cup is 129 gram. What is
the mean if not more than 62% has less than 440 gram of sugar per cup?
– What if not more than 62% has more than 440 gram of sugar per cup?
– In a vitamin factory, the standard deviation of vitamin C is 16 milligram. What is
the mean if more than 75% should have 51 milligram of vitamin C or more?
Discrete & Continuous
Distributions
– A random variable is discrete if the set of all possible values is at most a finite or a countably infinite number of possible
values.
Examples:
1. Randomly selecting 25 people who consume soft drinks and determining how many people prefer diet
soft drinks
2. Counting the number of people who arrive at a store during a five-minute period

• A random variable is continuous if it can take on values at every point over a given interval.
Examples:
1. Measuring the time between customer arrivals at a retail outlet
2. Measuring the weight of grain in a grain elevator at different points of time

• Discrete distributions (binomial, Poisson, hypergeometric) are constructed from discrete random
variables.
• Continuous distributions (uniform, normal, exponential, and others) are constructed from
continuous random variables.
I. Discrete Distribution
– A histogram is the most common graphical way of describing a discrete
distribution.
• An executive is considering out-of-town business travel for a given Friday. She recognizes
that at least one crisis could occur on the day that she is gone and she is concerned about
that possibility. Table 5.2 shows a discrete distribution that contains the number of crises
that could occur during the day that she is gone and the probability that each number will
occur.
5.2 Describing a Discrete Distribution

Mean, Variance, and Standard Deviation of Discrete Distributions

– The mean or expected value of a discrete distribution is the long run


average of occurrences.

where
long-run average
an outcome
probability of that outcome
• In the long run, the mean or expected number
of crises on a given Friday for this executive is
1.15 crises.
• However, there will never be exactly 1.15
crises.
II.
Continuous
Distributions
6.2 The Normal Distribution
Characteristics of the Normal Distribution

• It is a continuous distribution.
• It is a symmetrical
distribution about its mean.
• It is asymptotic to the
horizontal axis.
• It is unimodal.
• It is a family of curves.
• Area under the curve is 1.
6.2 The Normal Distribution
Probability Density Function of the Normal Distribution

– Shows area under the normal curve for a given mean and standard deviation.
– Since it is difficult to use the formula, common to use a table or computer.
6.2 The Normal Distribution

Standardized Normal Distribution

– The normal distribution is described by its mean and standard deviation.


– All normal distributions can be converted to a single distribution, the z distribution, using
the formula:

– A z score is the number of standard deviations that a value, x, is above or below the
mean.
– The z distribution is a normal distribution with a mean of 0 and a standard deviation of 1.
6.2 The Normal Distribution

Solving for Probabilities Using the Normal Curve


– Example: According to the U.S. Environmental Protection Agency (EPA), on average
there are 4.43 pounds of waste generated per person in the U.S. per day.
– Suppose waste generated per person per day in the U.S. is normally distributed
with a standard deviation of 1.32 pounds.
– If a U.S. person is randomly selected, what is the probability that the person generates more
than 6.00 pounds of waste per day?

– First, find the z value:

– Look the value up in the z table, which gives an area of .3830.


6.2 The Normal Distribution
Solving for Probabilities Using the Normal Curve
– Example, continued.
– .3830 is the area between the mean and the z value of 1.19 (x value of 6).
– Subtract from .5 to get the area in the upper tail.

• There is an 11.7% chance that a randomly


selected person will generate more than 6
pounds of waste per day.
6.2 The Normal Distribution
Using the Computer to Solve for Normal Distribution Probabilities

• Both Excel and Minitab can be used.

• For the waste generation problem


given earlier, if a U.S. person is
randomly selected, what is the
probability that the person generates
between 5.30 and 6.50 pounds of
waste per day?

• Both programs give the probability,


0.1965.
III. Sampling Techniques

Reasons for Sampling


– The sample can save money.
– The sample can save time.
– For given resources, the sample can broaden the scope of the study.
– Because the research process is sometimes destructive, the sample can save product.
– If accessing the population is impossible, the sample is the only option.
Reasons for Taking a Census
• Eliminate the possibility that a randomly selected sample may not be
representative of the population.
• For the safety of the consumer.
• To benchmark data for future studies.
Frame
• List, map, or directory used in the sampling process to represent the
population.
• Also called the working population.
7.1 Sampling
Frame
– A frame is overregistered if it contains units that are not in the target
population.
– A frame is underregistered if it does not include some units that are in the
population.
Types of Sampling Designs

14-15
7.1 Sampling

Random Versus Nonrandom Sampling


– In random sampling, every unit of the population has the same chance of being selected.
– In nonrandom sampling, not every unit of the population has the same chance of being
selected.
– Generally NOT an appropriate technique for gathering data for statistical analysis

Simple Random Sampling


– Each unit in the frame is numbered from 1 to N (the size of the population.
– A random number table or generator is used to select n items into the sample.
7.1 Sampling
Simple Random Sampling, continued.
Example: From the population frame of companies in Table 7.3, select a simple
random sample of six companies.
– First, the companies were numbered from 1 to 30.
7.1 Sampling
Example, continued:
– From the table of random number, two digit numbers are selected, discarding any that are over 30.
– In the table below, the first two digits are 91, which is unusable.
– The second two digits are 56, also unusable, as is 74, the next two digits
– The fourth set of two digits are 25, which corresponds with Occidental Petroleum.
7.1 Sampling
Example, continued:
– Continue moving across the rows until six two-digit numbers are selected.
– Sample will be:
– (25) Occidental Petroleum
– (27) Procter & Gamble
– (01) Alaska Airlines
– (04) Bank of America
– (02) Alcoa
– (29) Sears
7.1 Sampling
Stratified Random Sampling
– Population is divided into nonoverlapping subpopulations (strata).
– Researcher selects a random sample from each.
– Can reduce sampling error, because sample will more closely match the population.
– More costly than a simple random sample.
– Strata are usually chosen based on available information about the population.

• Within each group, there should be


homogeneity.

• Between each group, there should be


heterogeneity.
7.1 Sampling

Systematic Sampling
– Every kth item is selected to produce a sample of size n from a population of size N.

Example: A business researcher wanted to sample Texas manufacturers as part of a management study.
– Wanted to sample 1,000 companies.
– Frame-- most recent edition of the Texas Manufacturers Register® which listed 26,000 manufacturing
companies in alphabetic order.
– The value of k was 26 (26,000/1,000).
– Use random number table to choose the first element in the study.
7.1 Sampling

Cluster (or Area) Sampling

– Dividing population into nonoverlapping areas.


– Clusters that are internally heterogeneous.
– Example: states, cities
– If clusters are too large, a second set of clusters can be taken from the initial cluster (two-stage
sampling).

– Advantages: convenience, cost


– Disadvantages: may be less efficient than simple random sampling if the elements of the cluster
are similar
7.1 Sampling

Nonrandom Sampling
– Any method that does not involve a random selection process.

Convenience Sampling
– Selected for the convenience of the researcher.

Judgment Sampling
– Chosen by the judgement of the researcher.
– Since the probability of an element being selected cannot be determined, cannot determine
sampling error.
– Can be biased due to systematic errors in judgment.
7.1 Sampling

Quota Sampling
– Population subclasses, such as age or gender, are used as strata.
– Can be useful if no frame is available for the population.
– Can be less costly.
– But nonrandom, and thus probabilities cannot be calculated.

Snowball Sampling
7.1 Sampling

Sampling Error
– Occurs when the sample is not representative of the population.

Nonsampling Error
– All other errors other than sampling error.
– Missing data
– Recording errors
– Measurement errors
– Input processing errors
– Analysis errors
– Response errors
– And many more!
7.2 Sampling Distribution of
Suppose that a small, finite population contains only N = 8 numbers:
54 55 59 63 64 68 69 70

– Distribution of the population data:

– Suppose that all possible samples of size n = 2 are taken from this population.
7.2 Sampling Distribution of
Population:
54 55 59 63 64 68 69 70

All possible samples of n = 2:

– Then take the means of all of the samples.


7.2 Sampling Distribution of
Means of the samples:

Distribution of the means of the samples:


7.2 Sampling Distribution of
– Distribution of the mean of the samples looks different from the original
distribution

– Similarly, the histogram of a Poisson distribution and its samples are different.
7.2 Sampling Distribution of
The Central Limit Theorem
– If random samples of size n are repeatedly drawn from a population that has a mean of μ and a
standard deviation of σ, the sample means,, are approximately normally distributed for sufficiently
large sample sizes (n ≥ 30), regardless of the shape of the population distribution. If the population
is normally distributed, the sample means are normally distributed for any size sample.

– It can be shown that the mean of the sample means is the population mean:

– The standard deviation of the sample means (the standard error of the mean) is:
7.2 Sampling Distribution of

You might also like