0% found this document useful (0 votes)
8 views

Lecture 4

Uploaded by

Moybon Kalif
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 4

Uploaded by

Moybon Kalif
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Sampling Methods and Sampling Distributions

1
Introduction
Researchers often use sample survey methodology to obtain information
about a larger population by selecting and measuring a sample from that
population.
Since population is too large, we rely on the information collected from the
sample mainly for cost minimization.
Inferences about the population are based on the information from the
sample drawn from that population.
Introduction…
A sample is a collection of individuals selected from a larger population.

For example, we may have a single sample composed of 50 individuals,


representing a population of 1000 people.
Researchers are not interested in the sample itself, but in what can be
learned from the sample—and how this information can be applied to the
entire population.
Sampling
The process of selecting a portion of the population to represent the entire
population.
A main concern in sampling:
 Ensure that the sample represents the population, and
The findings can be generalized.
Advantages of sampling

Feasibility: Sampling may be the only feasible method of collecting


information.
Reduced cost: Sampling reduces demands on resource such as finance,
personnel, and material.
Greater accuracy: Sampling may lead to better accuracy of collecting data

Greater speed: Data can be collected and summarized more quickly


Disadvantages of sampling:
There is always a sampling error.

Sampling may create a feeling of discrimination within the population.

Difficulties in selecting a truly representative sample

Chance of bias
Errors in sampling
1) Sampling error: Errors introduced due to problems in the selection of a sample.
They cannot be avoided or totally eliminated but can be reduced.

2) Non-sampling error:
 Observational error

 Respondent error

 Lack of preciseness of definition

 Errors in editing and tabulation of data


Population
Reference population (or target population): the population of interest to
whom the researchers would like to make generalizations.
Sampling population: the subset of the target population from which a
sample will be drawn.
Study population: the actual group in which the study is conducted
Researchers are interested to know about factors
associated with ART use among HIV/AIDS patients
attending certain hospitals in a given Region

Target population = All ART


patients in the Region

Sampling population = All


ART patients in, e.g. 3,
hospitals in the Region

Sample(study)
Sampling Methods
Two broad divisions:

A. Probability sampling methods

B. Non-probability sampling methods


A. Probability sampling
Involves random selection of a sample

Every sampling unit has a known and non-zero probability of selection into
the sample.
Involves the selection of a sample from a population, based on chance.
Probability sampling is:
 More complex,
 More time-consuming and
 Usually more costly than non-probability sampling.
However, because study samples are randomly selected and their
probability of inclusion can be calculated:
 reliable estimates can be produced and
 inferences can be made about the population.

The method chosen depends on a number of factors, such as


the availability of sampling frame
how spread out the population is and
how costly it is to survey members of the population
Most common probability sampling methods
Simple random sampling
Systematic random sampling
Stratified random sampling
Cluster sampling
Multi-stage sampling
1. Simple random sampling

Basic and common sampling technique in quantitative research

The required number of individuals are selected at random from the sampling frame, a
list or a database of all individuals in the population.
Each member of a population has an equal chance of being included in the sample.
To use a SRS method:
Make a numbered list of all the units in the population

Each unit should be numbered from 1 to N (where N is the size of the population)

Select the required number.


The randomness of the sample is ensured by:
Use of “lottery’ methods =for small samples
Table of random numbers
Computer programs
SRS has certain limitations:
Requires a sampling frame.
Difficult if the reference population is dispersed.
Minority subgroups of interest may not be selected.
2. Systematic random sampling

Sometimes called interval sampling

Selection of individuals from the sampling frame is done systematically.

Individuals are taken at regular intervals down the list

The starting point is chosen at random


Important if the reference population is arranged in some order:
Order of registration of patients
Numerical number of house numbers
Student’s registration books

Taking individuals at fixed intervals (every kth) based on the sampling


fraction
Steps in systematic random sampling
1. Number the units on your frame from 1 to N (where N is the total population size).

2. Determine the sampling interval (K) by dividing the number of units in the
population by the desired sample size.

3. Select a number between one and K at random. This number is called the random
start and would be the first number included in your sample.

4. Select every Kth unit after that first number

Note: Systematic sampling should not be used when a cyclic repetition is inherent in the
sampling frame.
Example
To select a sample of 100 from a population of 400, you would need a sampling
interval of 400 ÷ 100 = 4. Therefore, K = 4.
You will need to select one unit out of every four units to end up with a total of 100
units in your sample.
Select a number between 1 and 4 from a table of random numbers.

If you choose 3, the third unit on your frame would be the first unit included in your
sample
The sample might consist of the following units to make up a sample of 100: 3 (the
random start), 7, 11, 15, 19...395, 399 (up to N, which is 400 in this case).
3. Stratified random sampling

It is done when the population is known to be have heterogeneity with regard to some
factors and those factors are used for stratification
Using stratified sampling, the population is divided into homogeneous, mutually
exclusive groups called strata, and
A population can be stratified by any variable that is available for all units prior to
sampling (e.g., age, sex, province of residence, income, etc.)
A separate sample is taken independently from each stratum.

Any of the sampling methods mentioned in this section can be used to sample within
each stratum.
If you create strata within which units share similar characteristics (e.g.,
income) and are considerably different from units in other strata
(e.g., occupation, type of dwelling) then you would only need a small sample
from each stratum to get a precise estimate of total income for that stratum.
Then you could combine these estimates to get a precise estimate of total
income for the whole population.
If you use a SRS approach in the whole population without stratification, the
sample would need to be larger than the total of all stratum samples to get an
estimate of total income with the same level of precision.
Stratified sampling ensures an adequate sample size for sub-groups in the
population of interest.
 When a population is stratified, each stratum becomes an independent
population and you will need to decide the sample size for each stratum.
Allocation of sample size to stratum

Equal allocation: Allocate equal sample size to each stratum

Proportionate allocation
n
nj  N j
N

nj is sample size of the jth stratum

 Nj is population size of the jth stratum

 n = n1 + n2 + ...+ nk is the total sample size


Example: Proportionate Allocation

• Village A B C D Total
• HHs 100 150 120 130 500
• S. size ? ? ? ? 60
4. Cluster sampling
Sometimes it is too expensive to carry out Simple RS
Population may be large and scattered.
Complete list of the study population unavailable
Travel costs can become expensive if interviewers have to survey people
from one end of the country to the other.
Cluster sampling is the most widely used to reduce the cost

The clusters should be homogeneous, unlike stratified sampling where the


strata are heterogeneous
Steps in cluster sampling
Cluster sampling divides the population into groups or clusters.

A number of clusters are selected randomly to represent the total population,


and then all units within selected clusters are included in the sample.
No units from non-selected clusters are included in the sample—they are
represented by those from selected clusters.
This differs from stratified sampling, where some units are selected from
each group.
Example

In a school based study, we assume students of the same school are
homogeneous.
We can select randomly sections and include all students of the selected
sections only
Main advantage is Cost reduction
5. Multi-stage sampling
Similar to the cluster sampling, except that it involves picking a sample from
within each chosen cluster, rather than including all units in the cluster.
This type of sampling requires at least two stages.

The primary sampling unit (PSU) is the sampling unit in the first sampling
stage.
The secondary sampling unit (SSU) is the sampling unit in the second
sampling stage, etc.
Woreda PSU

Kebele SSU

Sub-Kebele TSU

HH
In the first stage, large groups or clusters are identified and selected.

These clusters contain more population units than are needed for the final
sample.
In the second stage, population units are picked from within the selected
clusters (using any of the possible probability sampling methods) for a final
sample.
If more than two stages are used, the process of choosing population units
within clusters continues until there is a final sample.
B. Non-probability sampling
In non-probability sampling, every item has an unknown chance of being
selected.
In non-probability sampling, there is an assumption that there is an even
distribution of a characteristic of interest within the population.
This is what makes the researcher believe that any sample would be
representative and because of that, results will be accurate.
For probability sampling, random is a feature of the selection process, rather
than an assumption about the structure of the population.
In non-probability sampling, since elements are chosen arbitrarily, there is no
way to estimate the probability of any one element being included in the
sample.
Also, no assurance is given that each item has a chance of being included

Reliability cannot be measured and there is no way to measure the precision


of the resulting sample
Despite these drawbacks, non-probability sampling methods can be useful
when descriptive comments about the sample itself are desired.
Secondly, they are quick, inexpensive and convenient.

There are also other circumstances, such as researches, when it is unfeasible


or impractical to conduct probability sampling.
The most common types of non-probability sampling

1. Convenience or haphazard sampling

2. Volunteer sampling

3. Judgment sampling

4. Quota sampling

5. Snowball sampling technique


Sampling Distributions

35
Introduction
Parameter: Population characteristics or descriptive measure taken
from the population e.g. μ, σ, P etc.
Sample statistic: Any quantity computed from values in a sample e.g.
,sample proportion etc.
The value of population parameters are fixed.

The value of statistic vary from one sample to another.

36
Introduction…
A sampling distribution is a distribution of all possible values of a
statistic computed from samples of the same size randomly selected
from the same population.
Serves to answer probability questions about sample statistics

When sampling a discrete, finite population, a sampling distribution


can be constructed.
However, this construction is difficult with a large population and
impossible with an infinite population.
37
If we take a sample and calculate the statistic, e.g., mean.

Take another sample (same size) and calculate mean.

Repeat & repeat & repeat & ………..

We do not expect all the sample means to be the same

They will vary

Put all these sample statistics together to get a distribution of sample


statistics.

38
A. Sampling distribution of sample mean
Properties of sampling distribution of mean

a. If a population is normal with mean μ and standard deviation σ, the


sampling distribution of mean is also normally distributed with
μ x and
μ σ
σx 
n
b. The mean, μ, of the distribution of sample mean is equal to the
mean of the population from which the samples were drawn

C. The variance of the distribution of sample mean is equal to the


variance of the population divided by the sample size.
39
The square root of the variance of sampling distribution of is called
standard error of the mean or, simply, standard error.
σ
σx 
n

40
SE is determined by both the sample size and the degree of
variability among the individual observations
SD quantifies the amount of variability among individuals in a
population, while
SE quantifies the variability among means of repeated samples
drawn from that population
The SE is always smaller than the SD (except when n = 1)

41
Central Limit Theorem
 The central limit theorem states that if you have a population with mean μ
and standard deviation σ then the distribution of the sample means will be
approximately normally distributed provided the sample size is sufficiently
large (usually n > 30).
 If the population is normal, then the theorem holds true even for samples
smaller than 30.
 For the population proportions, provided that (np, n(1-p))> 5, where n is the
sample size and p is the probability of success in the population.
42
So we can use the normal probability model to quantify uncertainty
when making inferences about a population mean based on the
sample mean.
When the sampling is done from a non-normally distributed
population, the central limit theorem is used.
The larger the sample size, the better will be the normal
approximation to the sampling distribution of the mean.

43
Applications of the sampling distributions of
sample mean
Helps in computing the probability of obtaining a sample with a
mean of some specified magnitude.
z-value for sampling distribution of x
(x  μ)
z 
σ
n

where: X = sample mean


μ = population mean
σ = population standard deviation
n = sample size
44
Example
Suppose a population has mean μ = 8 and standard deviation σ = 3.
Suppose a random sample of size n = 36 is selected.
What is the probability that the sample mean is between 7.8 and 8.2?

45
Solution:

Even if the population is not normally distributed, the central limit


theorem can be used (n > 30)

so the sampling distribution of x is approximately normal

with mean μx = 8

and σ 3
σx   0.5
n 36

46
 
 7.8 - 8 μx -μ 8.2 - 8 
P(7.8  μ x  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.4  z  0.4)  0.3108

Sampling Standard Normal


Distribution Distribution
.1554
+.1554
Sample Standardize

-0.4 0.4
x 7.8
μx 8
8.2
x μz 0 z
47
B. Distribution of the sample
proportion
The sample proportion is derived from counts or frequency data.

Easier and more reliable, does not depend on variance.

Sample proportion =

Population proportion = p or π

48
Population proportion (p) = the proportion of population having some
characteristic

Sample proportion ( ) provides an estimate of p:


x number of successes in the sample
p 
n sample size

49
Properties of the sampling distribution of
sample proportion
Construction of the sampling distribution of the sample proportion is done
in a manner similar to that of the mean.
Applying the central limit theorem, the shape of the sampling distribution is
approximately normal provided that n is large enough.
The mean of the distribution, μp, will be equal to the true population
proportion, p, and the variance of the distribution, σp2 will be equal to
p(q)/n.
50
How large does n need to be?
Central limit theorem for proportions:

np 5
n(1 p) 5

51
z-Value for Proportions
Standardize p to a z value with the formula:

p p p p
z  
σp p(1  p)
n

52
Example
According to a recent estimate, 19.4% of the adult male population was obese. What is
the probability that in a random sample of size 150 from this population fewer than 15%
will be obese?

Note: np = 150*0.194 = 29.1 > 5

nq=150 *0.806=120.9>5

n = 150, p = .194, Find P( p < 15)

53
Find the z score

A value of z = -1.36 gives an area of .0869 which is the probability P


(z < -1.36) = .0869
The probability that p < 15 is .0869.

54
THANKS FOR YOUR ATTENTION!!!!

55

You might also like