06 Sampling
06 Sampling
Sampling Process
Sampling Sample
Frame
Inference
5
Define the target population
Conduct fieldwork
6
Define the Target Population
The target population is the collection of elements or objects that possess the information sought by the
researcher and about which inferences are to be made. The target population should be defined in terms of
elements, sampling units, sampling frame, and time.
• An element is the object about which or from which the information is desired, e.g., the respondent.
• A sampling unit is an element, or a unit containing the element, that is available for selection at some
stage of the sampling process.
• Time is the time period under consideration. ]
• Sampling frame: it is the representation of the elements of the target population. Ex. telephone book,
trade directory, mailing list.
Different Sampling Techniques
8
Convenience Sampling
Convenience sampling attempts to obtain a sample of convenient elements.
Often, respondents are selected because they happen to be in the right place at the
right time. Also called haphazard or accidental sampling
• use of students, and members of social organizations
• mall intercept interviews without qualifying the respondents
• department stores using charge account lists
• “people on the street” interviews
Judgmental Sampling
Judgmental sampling, also called purposive sampling is a form of convenience
sampling in which the population elements are selected based on the judgment of
the researcher.
• test markets
• purchase engineers selected in industrial marketing research
• expert witnesses used in court
Quota Sampling
Quota sampling may be viewed as two-stage restricted judgmental sampling.
• The first stage consists of developing control categories, or quotas, of population elements. It may include
gender, age, income etc.
• In the second stage, sample elements are selected based on convenience or judgment.
Ex. Population 3,50,000, sample size is 1000.
Population Sample
composition composition
Control
Characteristic Percentage Percentage Number
Sex
Male 48 48 480
Female 52 52 520
____ ____ ____
100 100 1000
Age
18-30 27 27 270
31-45 39 39 390
46-60 34 34 340
Snowball Sampling
In snowball sampling, an initial group of respondents is selected, usually at
random.
• After being interviewed, these respondents are asked to identify others who
belong to the target population of interest.
• Subsequent respondents are selected based on the referrals.
Simple Random Sampling
• Each element in the population has a known and equal probability of selection.
• Each possible sample of a given size (n) has a known and equal probability of
being the sample actually selected.
• This implies that every element is selected independently of every other element.
Systematic Sampling
• The sample is chosen by selecting a random starting point and then picking every ith element in
succession from the sampling frame.
• The sampling interval, i, is determined by dividing the population size N by the sample size n and
rounding to the nearest integer.
Stratified Sampling
• A two-step process in which the population is partitioned into subpopulations, or strata.
• The strata should be mutually exclusive and collectively exhaustive in that every population element should
be assigned to one and only one stratum and no population elements should be omitted.
• Next, elements are selected from each stratum by a random procedure, usually SRS.
Stratified Sampling
• The elements within a stratum should be as homogeneous as possible, but the elements in different strata
should be as heterogeneous as possible.
• The stratification variables should also be closely related to the characteristic of interest.
• In proportionate stratified sampling, the size of the sample drawn from each stratum is proportionate to the
relative size of that stratum in the total population.
• In disproportionate stratified sampling, the size of the sample from each stratum is proportionate to the
relative size of that stratum and to the standard deviation of the distribution of the characteristic of interest
among all the elements in that stratum.
Cluster Sampling
• First stage a sample of areas is chosen;
• Second stage a sample of respondents within those areas is selected.
• Population divided into clusters of homogeneous units, usually based on
geographical closeness.
Types of Cluster Sampling
Cluster Sampling
QUESTION 1:
A manufacturer supplies the rear axles for U.S. Postal Service mail
trucks. These axles must be able to withstand 80,000 pounds per square
inch in stress tests, but an excessively strong axle raises production
costs significantly. Long experience indicates that the standard deviation
of the strength of its axles is 4000 pounds per square inch. The
manufacturer selects a sample of 100 axles from production, tests them,
and finds that the mean stress capacity of the sample is 79600 pounds
per square inch. If the axle manufacturer uses a significance level (α) of
0.05 in testing, will the axles meet his stress requirements?
23
SAMPLE SIZE
What affects sample size
• Purpose of the study
• Population size
• Risk of selecting a bad sample
• Allowable sampling error (level of precision)
• Level of confidence level or risk
• Degree of variability in the attributes being measured
Ways to Determine Sample Size
• Generally 10%
• Using a census for small populations
• Similar study
• Rules of thumb
• No. of variable based
• Main group n > 100
• Subgroups 20 < n < 100
• Statistical precision
26
Steps to find SD of universe
• One requires the standard deviation of the universe, which is unknown. A rough estimate of this measure,
however, is sufficient for suggesting sample size.
• a) In many instances, the experience with similar problems will be a good guide for making this estimate of
the standard deviation.
• b) In other instances, an exploratory sample study on a small scale may be conducted in order to arrive at an
estimate of SD.
• c) To estimate the standard deviation of the universe, the range of the values in the universe may be estimated
and used as a guide. It is known that in normal distribution the range is about six times the standard deviation.
For practical purposes, an estimate of somewhere around one-fifth of the estimated range is often used.
• Suppose the range is roughly 300; that is, the difference between the lowest value in the universe and its
highest value is 300. One-fifth of this rough estimate is 60. Therefore, one may take 60 as a rough
approximation of SD.
Statistical Precision
Must know:
• Variability of population and individual stratum
• Acceptable level of sampling error
(SE = SD/√n)
• Needed level of confidence
28
Sample Size for a Proportion
Z pq 2
n
E 2
29
z 2 pq
n =
E 2
Where:
E2 = square of maximum allowance for error between true proportion and sample proportion, or zs p
squared.
Problem
• Suppose we wish to evaluate a state wide extension program in which
farmers were encouraged to adopt a new practice. Assume there is a
large population but that we do not know the variablility in the
proportion that will adopt the practice. Furthermore suppose we desire
a 95% confidence level and 5% precision. What would the desired
sample size
Soln.
• Since variability is not known let us assume p=0.5 (maximum
variability)
• Since, Z pq
2
n
E 2
• n=
• n=385 farmers
Finite population correction for proportions
• n1=2000÷[1+2000(0.05)2]
• n1=333 farmers
Sample Size Formula for means
zs
n
E
where:
n = sample size
z = confidence interval in standard error units
s = standard error of the mean
E = acceptable magnitude of error
35
Solved Problem 1
Suppose a survey researcher, studying expenditures on lipstick,
wishes to have a 95% confident level (Z) and a range of error (E) of
less than $2.00. The estimate of the standard deviation is $29.00.
36
Solved Problem 2
Suppose, in the same example as the one before, the range of error (E) is
acceptable at $4.00. By how much is sample size is reduced?
37
Calculating Sample Size of above problem with 99% confidence level
2 2
é(2.
57)(29) ù é(2.
57)(29) ù
n= ê ú n= ê ú
ë 2 û ë 4 û
2 2
é74.53ù é74.53ù
=ê ú =ê ú
ë 2 û ë 4 û
=[37.265] 2
=[186325
. ] 2
=1389 =347
38
Solved problem 3
• A sample of size n = 100 produced the sample mean of X = 16. Assuming the
population standard deviation σ = 3, compute a 95% confidence interval for the
population mean µ
• Z = (x ± µ)/ σ √ n
• 15.412, 16.588
Solved problem 4
• A market research company wishes to estimate the percentage of people in a
certain age bracket who read a current affairs magazine. The degree of accuracy
required for the sample is 94%. What size sample should be taken to estimate the
percentage to within 4%.
• Since p and 1-p are unknown, the best assumption to make is that they are both
equal to 0.5. This gives the widest confidence interval:
• n > z2 p(1-p)/e2
• n= 553 (Z=1.881)
Solved Problem 5
The operations manager of a large production plant would like to estimate the mean
amount of time a worker takes to assemble a new electronic component. Assume
that the standard deviation of this assembly time is 3.6 minutes.
a) After observing 120 workers assembling similar devices, the manager noticed
that their average time was 16.2 minutes. Construct a 92% confidence interval for
the mean assembly time.
b) How many workers should be involved in this study in order to have the mean
assembly time estimated up to 15 seconds(±15) with 92% confidence?
a) We want a 92% confidence interval for µ based on the following information: n = 120, X = 16.2 min, α =
1 − 0.92 = 0.08, and σ = 3.6 min.
n= (z2* σ2)/e2
= 636.
Thus, 636 workers are need in the study to achieve the desired precision of inference.
Solved problem 6
a) Suppose a consumer advocacy group would like to conduct a survey to find the
proportion p of consumers who bought the newest generation of an MP3 player
were happy with their purchase. a) How large a sample n should they take to
estimate p with 2% margin of error and 90% confidence?
b) The advocacy group took a random sample of 1000 consumers who recently
purchased this MP3 player and found that 400 were happy with their purchase. Find
a 95% confidence interval for p.
a) It is given that margin of error E = 0:02 and α = 0.10. Using p = 0.5 as the conservative guess in the sample
size formula gives,
a) n= [p(1-p)*(z)2]/e2
=
[(0.5*0.5)*(1.645)2 ]/e2
1692
=
b) From the data, p = 400/1000 = 0.40. Since n = 1000 is large, the 90% confidence interval for p is:
p±z
= 0.40 ± 1.645
= 0.375, 0.425.
Thus, between 37.5% to 42.5% consumers are estimated to be happy with their purchase.
Class Problems
• A researcher has taken a sample of 300 households to estimate the monthly expenses for department store
shopping and found that the mean monthly expenses for the sample is $182. Past studies indicated that the
population standard deviation can be assumed to be 455. Find an interval around the population mean that
will include 95% of the sample means, based on samples of 300 households.
• Suppose the researcher wants to estimate the monthly household expense for shopping more precisely so that
the estimate will be within +-$5 of the true population value. What should be the sample size.