Sampling and Sampling Distribution
Sampling and Sampling Distribution
and
Sampling Distribution
What is sampling?
Expensive
3
Disadvantages of sampling:
4
Terms in sampling
Sample: is a small group of individuals that is selected
for study from a larger population.
Sampling: the process of selecting a portion of the
population to represent the entire population.
Population is the group of people/objects in whom
we are interested.
Statistics: summary numerical description of variables
about the sample.
Parameters: summary numerical description of variables
about the population of interest.
Sample Size: number of individuals or observations in
the sample:
Terms in sampling…
Target population: Is the population to whom the result
would be applied
Source population: Is the population from whom the study
subject would be obtained
Study population or Sample population: Is the
population included in the sample
Sampling unit: is the actual unit that is considered for
selection.
Study unit: is the individual member of a population whose
characteristics are to be measured.
Terms in sampling …
Sampling frame: Is the list of potential subjects from which the
sample is drawn
Sampling fraction ( sampling interval): Is the ratio of the
number of units in the reference population to the number of unit
in the sample (N/n)
Table of random number: Is a table that consists of the digit 0-9
in such a way that each of the digits are selected at random so
that the appearance of one is independent of the other and have
equal chance
Selection method of sampling unit
samples of size n.
Selection method of sampling unit…
Generalization is possible
A. Simple Random Sampling (SRS)
14
Example on SRS
Suppose your school has 500 students and you need to conduct a
short survey on the quality of the food served in the cafeteria.
You decide that n=10 students should be sufficient for your
purposes.
Then you assign a number from 1 to 500 to each student in your
school.
To select the sample, you use a table of randomly generated
numbers.
Pick a starting point in the table (a row and column number) and
look at the random numbers that appear there.
15
Example on SRS ..
In this case, since the data run into three digits, the
random numbers would need to contain three digits as well
Ignore all random numbers after 500 because they do not
correspond to any of the students in the school.
Remember that the sample is without replacement, so if a
number recurs, skip over it and use the next random number.
The first 10 different numbers between 001 and 500 make up
your sample.
16
SRS…
SRS has certain limitations:
Advantage
The representativeness of the sample is improved. That is,
adequate representation of minority subgroups of interest
can be ensured by stratification and by varying the
sampling fraction between strata as required.
Disadvantage
Sampling frame for the entire population has to be
prepared separately for each stratum.
D. Cluster sampling
Sometimes it is too expensive to carry out SRS
Population may be large and scattered.
Complete list of the study population unavailable
Population consists of many natural groups
(clusters)
Travel costs can become expensive if interviewers
have to survey people from one end of the
other.
The clusters should be homogeneous, unlike
stratified sampling where the strata are
heterogeneous
Cluster sampling...
In this sampling scheme, selection of the required
sample is done on groups of study units (clusters)
instead of each study unit individually. The sampling
unit is a cluster, and the sampling frame is a list of
these clusters.
Procedure
The reference population (homogeneous) is
divided into clusters. These clusters are often
geographic units (eg. districts, villages, etc.)
A sample of such clusters is selected
All the units in the selected clusters are studied
Cluster sampling...
It is preferable to select a large number of small clusters
rather than a small number of large clusters
Advantage
Cost and time reduction
It creates 'pockets' of sampled units instead of
spreading the sample over the whole territory
(administrative convenience)
Sometimes a list of all units in the population is not
available, while a list of all clusters is either available
or easy to create.
Cluster sampling...
Disadvantage
Less efficient when compared with SRS.
Usually better to survey a large number of small
clusters instead of a small number of large clusters.
This is because neighboring units tend to be more alike,
resulting in a sample that does not represent the whole
spectrum of opinions or situations present in the overall
population (Design Effect).
It is based on the assumption that the characteristic to be
studied is uniformly throughout the reference popn,
which may not always be the case. Hence, sampling error
is usually higher than for a SRS of the same size.
E. Multi-stage sampling
Appropriate when the reference population is large and
widely scattered
Selection is done in stages ( at least 2) until the final sampling
unit (eg., households or persons) are arrived
The primary sampling unit (PSU) is the sampling unit (usually
large size) in the first sampling stage.
The secondary sampling unit (SSU) is the sampling unit in the
second sampling stage etc.
Similar to the cluster sampling, except that it involves picking
a sample from within each chosen cluster, rather than
including all units in the cluster.
Example - The PSUs could be kebeles and the SSUs could be
households.
31
Multi-stage sampling...
32
Multi-stage sampling...
Commonly used with cluster sampling
Multi‐Stage Cluster Sampling
Advantage:
You do not need to have a list of all units in the
population.
Saves a great amount of time and effort by not having
to create a list of all the units in a population.
Disadvantage:
Sampling error is increased compared with a simple
random sample
Require design effect (DF)
33
2. Non‐probability sampling
In non‐probability sampling, every item has an unknown
chance of being selected.
Unlike probability sampling, in non‐probability sampling,
there is an assumption that:
There is an even distribution of characteristics within the
population, this is what makes the researcher believe that
any sample would be representative
For probability sampling, random is a feature of the selection
process.
Non probability sampling…
35
Non probability sampling…
Despite these drawbacks, non‐probability sampling methods
can be useful when descriptive comments about the
sample itself are desired.
Secondly, they are quick, inexpensive and convenient.
There are also other circumstances, such as researches, when
it is unfeasible or impractical to conduct probability
sampling.
36
The most common types of non‐probability
sampling
37
A. Convenience or haphazard sampling
When we take a sample, our results will not exactly equal with
the results for the whole population. That is, our results will be
subject to errors.
Two types of errors
Sampling error (random error)
Non-sampling error (bias)
Sampling Error
The value of the characteristic measured in a sample differs
from that of the total population. Since a sample is a subset of
a larger group.
This type of error, arising from the sampling process, is
called sampling error.
Can’t be avoided or totally eliminated.
Minimized by increasing the size of the sample.
When n = N, sampling error = 0
46
Non Sampling Error
The errors other than sampling errors such as those arising through non-
response, in- completeness and inaccuracy of response are termed non-
sampling errors
47
Sampling Distribution
Sampling Distributions
A sampling distribution is the probability distribution of all
possible values of a sample statistic computed from samples of
the same size which are randomly selected from the same source
population.
Sample Sample
Sample
Sample
Sample
Sample
Sample
Sample
Sample Sample
Population
49
Sampling distribution…
Serves to answer probability questions about sample statistics.
Example:
μ
x i
N
18 20 22 24
21
4
σ
i
(x μ) 2
2.236
N
Now consider all possible samples of size
st nd
n=2
1 2 Observation 1st 2nd Observation
Obs 18 20 22 24 Obs 18 20 22 24
18 18,18 18,20 18,22 18,24 18 18 19 20 21
20 20,18 20,20 20,22 20,24 20 19 20 21 22
22 22,18 22,20 22,22 22,24 22 20 21 22 23
24 24,18 24,20 24,22 24,24 24 21 22 23 24
16 possible samples (with replacement)
• 16 Sample Means
Sample means Freq P( )
18 1 0.0625
19 2 0.1250
20 3 0.1875
21 4 0.2500
22 3 0.1875
23 2 0.1250
24 1 0.0625
Sampling distribution of all sample means
16 Sample Sample
Means Means
1st 2nd Observation Distribution
Obs 18 20 22 24 P(x)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 x
Summary measures of this sampling distribution: Add
the 16 sample means & divide by 16. Also calculate the
SD of the sample means.
μx
x
18 19 21 24
i
21
N 16
σx
i x
(x μ ) 2
N
(18 - 21)2 (19 - 21)2 (24 - 21)2
1.58
16
Comparing the population with its
sampling distribution
Population Sample means
N=4 distribution
μ 21 σ 2.236 μx 21n = σ2x 1.58
P(x) P(x)
.3 .3
.2 .2
.1 .1
0 0 18 19 20 21 22 23 24
_
18 20 22 24
x Mean
Sampling distribution…
We note that the mean of the sampling distribution of
has the same value as the mean of the original
population.
Sampling
Distribution
.45
x
.40
(x μ)
z
Standardize
σ
n
Standardized .4251
Normal
Distribution
0 z1.44
Example 1
Given: μ = 50, σ = 16, n = 64
Find: P(x > 53)
Solution
1. Write the given information, μ=50, σ=16, n=64
2. Sketch a normal curve
3. Convert x to a z score
…and σ 3
σx 0.5
n 36
7.8 - 8 μ x
- μ 8.2 - 8
P(7.8 μ x 8.2) P
3 σ 3
36 n 36
P(-0.4 z 0.4) 0.3108
Standard Normal
Sampling
Population Distribution
Distribution
Distribution
.1554
??? +.1554
? ??
? ? Sample Standardize
?? ?
?
-0.4 0.4
μ 8 x 7.8 8.2
x μz 0
z
μx 8
Example 3
The distribution of serum cholesterol levels for all 20-70
year-old males has mean µ = 211 mg/100 ml and SD = 46
mg/100 ml.
A. If a sample of size 25 is selected from this population,
what is the probability that the sample has a mean of 230 or
above?
Since x has a normal distribution with mean 211 and
standard error 9.2,
The area under the standard normal curve to the right of z
= 2.07 is 0.0197
The other 90% of the samples have means that are greater than
199.2 mg/100 ml
Distribution of the sample proportion
Approximation by a normal distribution if:
Sampling Distribution
np 5 P( p )
.3
.2
n(1 p) 5 .1
0
0 .2 .4 .6 8 1 p
Where and
μ p p p(1 p)
σp
n
(where p = population proportion)
z-Value for Proportions
Standardized
Sampling Distribution Normal Distribution
.4251
Standardize