0% found this document useful (0 votes)
6 views

Sampling and Sampling Distribution

Interesting Docu

Uploaded by

Tadesse Fenta
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Sampling and Sampling Distribution

Interesting Docu

Uploaded by

Tadesse Fenta
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 81

Sampling

and
Sampling Distribution
What is sampling?

 Sampling is a process of choosing a section of the


population for observation and study
 A census is a sample consisting of the entire population.

 Census has the following disadvantages:

 Expensive

 Takes a long time

 Cumbersome & therefore inaccurately done ( a careful


sample produces a more accurate data than a census.)
Advantages of sampling:
 Feasibility: Sampling may be the only feasible method of
collecting information.
 Reduced cost: Sampling reduces demands on resource
such as finance, personnel, and material.
 Greater accuracy: Sampling may lead to better accuracy of
collecting data
 Greater speed: Data can be collected and summarized
more quickly
Sampling enables us to estimate the characteristic of a
population by directly observing a portion of the
population.

3
Disadvantages of sampling:

 There is always a sampling error.


 Sampling may create a feeling of discrimination within the
population.
 Sampling may be inadvisable where every unit in the
population is legally required to have a record.

4
Terms in sampling
 Sample: is a small group of individuals that is selected
for study from a larger population.
 Sampling: the process of selecting a portion of the
population to represent the entire population.
 Population is the group of people/objects in whom
we are interested.
 Statistics: summary numerical description of variables
about the sample.
 Parameters: summary numerical description of variables
about the population of interest.
 Sample Size: number of individuals or observations in
the sample:
Terms in sampling…
 Target population: Is the population to whom the result
would be applied
 Source population: Is the population from whom the study
subject would be obtained
 Study population or Sample population: Is the
population included in the sample
 Sampling unit: is the actual unit that is considered for
selection.
 Study unit: is the individual member of a population whose
characteristics are to be measured.
Terms in sampling …
 Sampling frame: Is the list of potential subjects from which the
sample is drawn
 Sampling fraction ( sampling interval): Is the ratio of the
number of units in the reference population to the number of unit
in the sample (N/n)
 Table of random number: Is a table that consists of the digit 0-9
in such a way that each of the digits are selected at random so
that the appearance of one is independent of the other and have
equal chance
Selection method of sampling unit

1. Sampling without replacement: If we select a unit


from the population it should not be returned to the
population before the next draw
 For a population size N we can form NCn different

samples of size n.
Selection method of sampling unit…

2. Sampling with replacement: In this type of selection we


have to return the first selected unit to the population
before the next draw
 For a population size N we can construct Nn different
samples of size n.
Sampling methods
Classified as:
1. Probability Sampling
A. Simple random sampling
B. Systematic Sampling
C. Stratified Sampling
D. Cluster Sampling
E. Multistage cluster sampling
2. Non-probability sampling
A. Purposive (Judgmental)
B. Convenience (haphazard)
C. Voluntary sampling
D. Quota
E. Snow ball
1. Probability Sampling

 In probability sampling, every individual (element) in the

defined source of population may be selected into the

sample with a known (non-zero) probability.

 Let sampling frame be N & sample size be n. Every

individual has a known chance of n/N being selected.

 Generalization is possible
A. Simple Random Sampling (SRS)

 This is the most basic scheme of random sampling


Principle
– Equal chance/probability of drawing each unit
Procedure
– Take sampling population
– Need listing of all sampling units (“sampling frame”)
– Number all units
– Randomly draw units
 Can be done by: lottery method, random numbers or
computer programs to generate random numbers
Random number table

 It is a table of random numbers constructed by a


process that
1. In any position in the table, each of the numbers 0
through 9 has a probability 1/10 of occurring.
2. The occurrence of any number in one part of the
table is independent of the occurrence of any
number in any other part of the table.
13
Random number table
57172 42088 70098 11333 26902 29959 43909 49607
33883 87680 28923 15659 09839 45817 89405 70743
77950 67344 10609 87119 15859 74577 42791 75889
11607 11596 01796 24498 17009 67119 00614 49529
56149 55678 38169 47228 49931 94303 67448 31286
80719 65101 77729 83949 83358 75230 56624 27549
93809 19505 82000 79068 45552 86776 48980 56684
40950 86216 48161 17646 24164 35513 94057 51834
12182 59744 65695 83710 41125 14291 74773 66391
13382 48076 73151 48724 35670 38453 63154 58116
38629 94576 48859 75654 17152 66516 78796 73099
60728 32063 12431 23898 23683 10853 04038 75246
01881 99056 46747 08846 01331 88163 74462 14551
23094 29831 95387 23917 07421 97869 88092 72201
15243 21100 48125 05243 16181 39641 36970 99522
53501 58431 68149 25405 23463 49168 02048 31522
07698 24181 01161 01527 17046 31460 91507 16050
22921 25930 79579 43488 13211 71120 91715 49881
68127 00501 37484 99278 28751 80855 02035 10910
55309 10713 36439 65660 72554 77021 46279 22705
92034 90892 69853 06175 61221 76825 18239 47687
50612 84077 41387 54107 09190 74305 68196 75634
81415 98504 32168 17822 49946 37545 47201 85224
38461 44528 30953 08633 08049 68698 08759 45611
07556 24587 88753 71626 64864 54986 38964 83534
60557 50031 75829 05622 30237 77795 41870 26300

14
Example on SRS
 Suppose your school has 500 students and you need to conduct a
short survey on the quality of the food served in the cafeteria.
 You decide that n=10 students should be sufficient for your
purposes.
 Then you assign a number from 1 to 500 to each student in your
school.
 To select the sample, you use a table of randomly generated
numbers.
 Pick a starting point in the table (a row and column number) and
look at the random numbers that appear there.

15
Example on SRS ..

 In this case, since the data run into three digits, the
random numbers would need to contain three digits as well
 Ignore all random numbers after 500 because they do not
correspond to any of the students in the school.
 Remember that the sample is without replacement, so if a
number recurs, skip over it and use the next random number.
 The first 10 different numbers between 001 and 500 make up
your sample.

16
SRS…
SRS has certain limitations:

 Requires a sampling frame which is not always


possible.
 Minority subgroups of interest may not be selected.
 Difficult if the reference population is dispersed.
B. Systematic Random Sampling
 It is sometimes called interval sampling
 The items or individuals of the population are arranged in
some order
 It involves selection of individuals from the
sampling frame systematically rather than randomly
 Individuals are taken at regular intervals down the
list (Every Kth individual is chosen from the
sampling frame)
 The starting point is chosen at random
Systematic random sampling…
Steps in systematic random sampling:

1. Number the units on your frame from 1 to N (where N is the


total population size).
2. Determine the sampling interval (K) by dividing the number of
units in the population by the desired sample size. K=
3. Select a number between one and K at random. This number is called
random start and would be the first number included in your sample.
4. Select every Kthunit after that first number
Systematic Random Sampling…
Systematic Random Sampling…
 Note: Systematic sampling should not be used when a cyclic
repetition is inherent in the sampling frame.
Advantage:
 Easier to perform
 Require less time than SRS
 Very good when the population from which sample is to
be draw homogeneously distributed
 Unlike SRS, systematic sampling can be conducted without a
sampling frame (useful in some situations where a sampling
frame is not readily available).
Systematic Random Sampling…
Disadvantage:
patterns/periodicity
 List of married couples arranged with men's names
alternatively with the women's names (every 2 nd, 4th , etc.)
will result in a sample of all men or women).
 If we want to select a random sample of a certain day
(sampling fraction on which to count clinic attendance, this
day may fall on the same day of the week, which might, for
example be a market day.
C. Stratified Sampling
 It is done when the population is known to have
heterogeneity with regard to some factors and
those factors are used for stratification
 Using stratified sampling, the population is divided
into homogeneous, mutually exclusive groups
called strata, and
 A population can be stratified by any variable that is
available for all units prior to sampling (e.g., age, sex,
income, geographic area --urban vs. rural, developed
vs. developing country etc.)
Stratified Sampling...
 A separate sample is drawn independently from
each stratum.
 Any of the sampling methods mentioned in this
section (and others that exist) can be used to
sample within each stratum.
 Stratified sampling ensures an adequate sample size
for sub‐groups in the population of interest.
 When a population is stratified, each stratum becomes
an independent population and you will need to
decide the sample size for each stratum.
Stratified Sampling...

 proportional allocation - if the same sampling fraction is


used for each stratum.
 Non- proportional allocation - if a different sampling
fraction is used for each stratum or if the strata
are unequal in size and a fixed number of units is
selected from each stratum.
Stratified Sampling...

Advantage
 The representativeness of the sample is improved. That is,
adequate representation of minority subgroups of interest
can be ensured by stratification and by varying the
sampling fraction between strata as required.

Disadvantage
 Sampling frame for the entire population has to be
prepared separately for each stratum.
D. Cluster sampling
 Sometimes it is too expensive to carry out SRS
Population may be large and scattered.
Complete list of the study population unavailable
Population consists of many natural groups
(clusters)
Travel costs can become expensive if interviewers
have to survey people from one end of the
other.
 The clusters should be homogeneous, unlike
stratified sampling where the strata are
heterogeneous
Cluster sampling...
 In this sampling scheme, selection of the required
sample is done on groups of study units (clusters)
instead of each study unit individually. The sampling
unit is a cluster, and the sampling frame is a list of
these clusters.
Procedure
The reference population (homogeneous) is
divided into clusters. These clusters are often
geographic units (eg. districts, villages, etc.)
A sample of such clusters is selected
All the units in the selected clusters are studied
Cluster sampling...
 It is preferable to select a large number of small clusters
rather than a small number of large clusters
Advantage
 Cost and time reduction
 It creates 'pockets' of sampled units instead of
spreading the sample over the whole territory
(administrative convenience)
 Sometimes a list of all units in the population is not
available, while a list of all clusters is either available
or easy to create.
Cluster sampling...
Disadvantage
 Less efficient when compared with SRS.
 Usually better to survey a large number of small
clusters instead of a small number of large clusters.
 This is because neighboring units tend to be more alike,
resulting in a sample that does not represent the whole
spectrum of opinions or situations present in the overall
population (Design Effect).
 It is based on the assumption that the characteristic to be
studied is uniformly throughout the reference popn,
which may not always be the case. Hence, sampling error
is usually higher than for a SRS of the same size.
E. Multi-stage sampling
 Appropriate when the reference population is large and
widely scattered
 Selection is done in stages ( at least 2) until the final sampling
unit (eg., households or persons) are arrived
 The primary sampling unit (PSU) is the sampling unit (usually
large size) in the first sampling stage.
 The secondary sampling unit (SSU) is the sampling unit in the
second sampling stage etc.
 Similar to the cluster sampling, except that it involves picking
a sample from within each chosen cluster, rather than
including all units in the cluster.
Example - The PSUs could be kebeles and the SSUs could be
households.
31
Multi-stage sampling...

32
Multi-stage sampling...
 Commonly used with cluster sampling
Multi‐Stage Cluster Sampling
Advantage:
You do not need to have a list of all units in the
population.
Saves a great amount of time and effort by not having
to create a list of all the units in a population.
Disadvantage:
 Sampling error is increased compared with a simple
random sample
Require design effect (DF)

33
2. Non‐probability sampling
 In non‐probability sampling, every item has an unknown
chance of being selected.
 Unlike probability sampling, in non‐probability sampling,
there is an assumption that:
 There is an even distribution of characteristics within the
population, this is what makes the researcher believe that
any sample would be representative
 For probability sampling, random is a feature of the selection
process.
Non probability sampling…

 In non‐probability sampling, since elements are chosen


arbitrarily, there is no way to estimate the probability of
any one element being included in the sample.
 It may lead to unrepresentative samples and/or results are
unconvincing
 Reliability can’t be ensured
 Inappropriate if the aim is to measure variables and generalize
findings obtained from a sample to the population.

35
Non probability sampling…
 Despite these drawbacks, non‐probability sampling methods
can be useful when descriptive comments about the
sample itself are desired.
 Secondly, they are quick, inexpensive and convenient.
 There are also other circumstances, such as researches, when
it is unfeasible or impractical to conduct probability
sampling.

36
The most common types of non‐probability
sampling

1. Convenience or haphazard sampling


2. Volunteer sampling
3. Judgment / purposive sampling
4. Quota sampling
5. Snowball sampling technique

37
A. Convenience or haphazard sampling

 Convenience sampling is sometimes referred to as


haphazard or accidental sampling.
 Is a method in which for convenience sake the study units
that happen to be available at the time of data collection
are selected
 It is not normally representative of the target population
because sample units are only selected if they can be
accessed easily and conveniently.
Convenience or haphazard sampling…
 It can be used when time and resources are too short, but that
advantage is greatly offset by the presence of bias.
 Although useful applications of the technique are limited, it
can deliver accurate results when the population is
homogeneous.

 For example: a scientist could use this method to determine


whether a lake is polluted or not.
 Assuming that the lake water is well‐mixed, any sample
would yield similar information.
 A scientist could safely draw water anywhere on the lake
without bothering about whether or not the sample is
representative
B. Volunteer sampling
 As the term implies, this type of sampling occurs when
people volunteer their services for the study.
 In psychological experiments or pharmaceutical trials
(drug testing), for example, it would be difficult and
unethical to enlist random participants from the general
public.
 The method is subjected to volunteer bias.
 Sometimes, the researcher offers payment to attract
respondents.
 In exchange, the volunteers accept the possibility of a
lengthy, demanding or sometimes unpleasant process.
C. Judgment sampling
 This approach is used when a sample is taken based on certain
judgments about the overall population.
 The underlying assumption is that the investigator will select
units that are characteristic of the population.
 The critical issue here is objectivity: how much can judgment
be relied upon to arrive at a typical sample?
 Judgment sampling is subject to the researcher's biases and is
perhaps even more biased than haphazard sampling
 Researchers often use this method in exploratory studies
like pre‐testing of questionnaires and focus groups.
 They also prefer to use this method in laboratory settings
where the choice of experimental subjects (i.e., animal,
human) reflects the investigator's pre‐existing beliefs about the
population.
D. Quota sampling
 The most common sampling method in market research
about the views on products
 A proper design may have been used to determine what
numbers are needed in each of the quotas
 Example; a sample of 50 men and 50 women
 Sampling is done until a specific number of units (quotas) for
various sub‐populations have been selected.
 Since there are no rules as to how these quotas are to be filled,
quota sampling is really a means for satisfying sample size
objectives for certain sub‐populations
Quota sampling…
 Quota sampling is an effective sampling method when
information is urgently required and can be carried out
independent of existing sampling frames.
 In many cases where the population has no suitable frame,
quota sampling may be the only appropriate sampling method.
 The main argument against quota sampling is that it does not
meet the basic requirement of randomness.
 Some units may have no chance of selection or the chance of
selection may be unknown. Therefore, the sample may be
biased.
E. Snowball sampling
 A technique for selecting a sample where existing study
subjects recruit future subjects from among their
acquaintances.
 Thus the sample group appears to grow like a rolling
snowball.
 Used to study hidden populations which are difficult for
researchers to access; example populations would be drug
users, CSWs, homeless or street children, etc. This sampling
technique is often used
 Because sample members are not selected from a sampling
frame, snowball samples are subject to numerous biases.
For example, people who have many friends are more likely
to be recruited into the sample
Error in Sampling

 When we take a sample, our results will not exactly equal with
the results for the whole population. That is, our results will be
subject to errors.
 Two types of errors
Sampling error (random error)
Non-sampling error (bias)
Sampling Error
 The value of the characteristic measured in a sample differs
from that of the total population. Since a sample is a subset of
a larger group.
 This type of error, arising from the sampling process, is
called sampling error.
 Can’t be avoided or totally eliminated.
 Minimized by increasing the size of the sample.
 When n = N, sampling error = 0

46
Non Sampling Error
 The errors other than sampling errors such as those arising through non-
response, in- completeness and inaccuracy of response are termed non-
sampling errors

 Results in distortion of the sample and study results.


 More serious type of error
 Multi-factorial causes

47
Sampling Distribution
Sampling Distributions
A sampling distribution is the probability distribution of all
possible values of a sample statistic computed from samples of
the same size which are randomly selected from the same source
population.
Sample Sample

Sample
Sample
Sample
Sample
Sample
Sample

Sample Sample
Population

49
Sampling distribution…
 Serves to answer probability questions about sample statistics.

 When sampling a discrete, finite population, a sampling distribution can be


constructed.

 However, this construction is difficult with a large population and

impossible with an infinite population.

 Here, we consider sample statistics as random variables.

Example:

 Age of individuals is a random variable.

 Similarly, mean age (values of a statistic) is a random variable.


Sampling distribution of sample mean
 Suppose we have a population of size N=4,
constituting the ages of four individuals (total popun).
x, Age (years): 18, 20, 22, 24

μ
 x i
N
18  20  22  24
 21
4

σ
 i
(x  μ) 2

2.236
N
Now consider all possible samples of size

st nd
n=2
1 2 Observation 1st 2nd Observation
Obs 18 20 22 24 Obs 18 20 22 24
18 18,18 18,20 18,22 18,24 18 18 19 20 21
20 20,18 20,20 20,22 20,24 20 19 20 21 22
22 22,18 22,20 22,22 22,24 22 20 21 22 23
24 24,18 24,20 24,22 24,24 24 21 22 23 24
 16 possible samples (with replacement)
• 16 Sample Means
Sample means Freq P( )
18 1 0.0625
19 2 0.1250
20 3 0.1875
21 4 0.2500
22 3 0.1875
23 2 0.1250
24 1 0.0625
Sampling distribution of all sample means

16 Sample Sample
Means Means
1st 2nd Observation Distribution
Obs 18 20 22 24 P(x)
.3
18 18 19 20 21
.2
20 19 20 21 22
.1
22 20 21 22 23
0 _
24 21 22 23 24 18 19 20 21 22 23 24 x
Summary measures of this sampling distribution: Add
the 16 sample means & divide by 16. Also calculate the
SD of the sample means.

μx 
 x

18  19  21    24
i
21
N 16

σx 
 i x
(x  μ ) 2

N
(18 - 21)2  (19 - 21)2    (24 - 21)2
 1.58
16
Comparing the population with its
sampling distribution
Population Sample means
N=4 distribution
μ 21 σ 2.236 μx 21n = σ2x 1.58
P(x) P(x)
.3 .3

.2 .2

.1 .1

0 0 18 19 20 21 22 23 24
_
18 20 22 24
x Mean
Sampling distribution…
 We note that the mean of the sampling distribution of
has the same value as the mean of the original
population.

 However, the variance is ≠ the original population


variance; but is equal to the population variance divided
by the sample size used to obtain sampling distribution.
Sampling distribution…
 The square root of the sampling distribution variance is
called standard error of the mean or, simply, standard
error.
σ
σx 
n
 OR, the standard deviation of any sample statistic is
called its standard error.
Sampling distribution…
 SE is determined by both the sample size and the
degree of variability among the individual observations

 SD quantifies the amount of variability among


individuals in a population, while

 SE quantifies the variability among means of repeated


samples drawn from that population

 The SE is always smaller than the SD (except when n


= 1)
Applications of the sampling
distributions of sample mean
 Helps in computing the probability of obtaining a sample
with a mean of some specified magnitude.
 Using sampling distribution we can get the probability of
obtaining sample statistics by transforming the statistics
into standard normal distribution.
Probability on SND
z-value for sampling distribution
of x
(x  μ)
z
σ
n

where: x = sample mean


μ = population mean
σ = population standard deviation
n = sample size
Use standard normal table: P(0 ≤ z ≤ 1.44) = .4251

Sampling
Distribution

.45
x
.40
(x  μ)
z
Standardize
σ
n
Standardized .4251
Normal
Distribution
0 z1.44
Example 1
 Given: μ = 50, σ = 16, n = 64
Find: P(x > 53)
 Solution
1. Write the given information, μ=50, σ=16, n=64
2. Sketch a normal curve
3. Convert x to a z score

4. Find the appropriate value(s) in the Table

The area of the SND above a value of z = 1.5 gives an area of

0.0668. The probability P (z > 1.5) = 0.0668

5. Complete the answer

The probability that X is greater than 53 is 0.0668.


Example 2
 Suppose a population has mean μ = 8 and standard
deviation σ = 3. Suppose a random sample of size n = 36
is selected.

 What is the probability that the sample mean is between


7.8 and 8.2?
Solution:
 Even if the population is not normally distributed, the
central limit theorem can be used (n > 30)
 … so the sampling distribution of x is approximately
normal
 … with mean
μx = 8

 …and σ 3
σx   0.5
n 36
 
 7.8 - 8 μ x
- μ 8.2 - 8 
P(7.8  μ x  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.4  z  0.4)  0.3108

Standard Normal
Sampling
Population Distribution
Distribution
Distribution
.1554
??? +.1554
? ??
? ? Sample Standardize
?? ?
?
-0.4 0.4
μ 8 x 7.8 8.2
x μz 0
z
μx 8
Example 3
 The distribution of serum cholesterol levels for all 20-70
year-old males has mean µ = 211 mg/100 ml and SD = 46
mg/100 ml.
A. If a sample of size 25 is selected from this population,
what is the probability that the sample has a mean of 230 or
above?
 Since x has a normal distribution with mean 211 and
standard error 9.2,
 The area under the standard normal curve to the right of z
= 2.07 is 0.0197

 Consequently, the probability that a sample of size 25 has


a mean of 230 mg/100 ml or higher is 0.0197.

B. What mean value of serum cholesterol level cuts off the


lower 10% of the sampling distribution?
 An area of 0.1003 in the lower tail of the SND is marked by
the value z = −1.28
 What is the corresponding value of ?

 Approximately 10% of samples of size 25 have means that are less


than or equal to 199.2 mg/100 ml.

The other 90% of the samples have means that are greater than
199.2 mg/100 ml
Distribution of the sample proportion
 Approximation by a normal distribution if:
Sampling Distribution
np 5 P( p )
.3
.2
n(1  p) 5 .1
0
0 .2 .4 .6 8 1 p
Where and

μ p p p(1  p)
σp 
n
(where p = population proportion)
z-Value for Proportions

Standardize p to a z value with the formula:


p p p p
z 
σp p(1  p)
n
Example 1
 According to a recent estimate, 19.4% of the adult
male population was obese. What is the probability
that in a random sample of size 150 from this
population fewer than 15% will be obese?
Note: npq = 150x0.194x0.806 = 24 > 5.
 n = 150, p = .194, Find P( p < 15)
Cont…
 Find the z score

 A value of z = -1.36 gives an area of .0869 which is the


probability P (z < -1.36) = .0869
 The probability that p < 15 is .0869.
Example 2
 If the true proportion of voters who support Proposition A is
p = .4, what is the probability that a sample of size 200
yields a sample proportion between .40 and .45?
 This means, if p = .4 and n = 200, then what is P(.40 ≤ p
≤ .45) ?
p(1  p) .4(1  .4)
σp   .03464
 1st Find n 200

 .40  .40 .45  .40 


 Then, P(.40 p .45)  P z  
 .03464 .03464 
convert P
 P(0 z 1.44)
into SND
Use standard normal table: P(0 ≤ z ≤ 1.44) = .4251

Standardized
Sampling Distribution Normal Distribution
.4251

Standardize

.40 .45 0 1.44


z
p
Example 3
 In a survey conducted in the 1990s, 19% of respondents
 18 years had not heard of the AIDS virus HIV. What is
the probability that in a sample size of 175 from this
population 25% or more will not have heard about the
virus?
2
σp = (0.19)(0.81)/175 = 0.0009, σp = 0.03
z = (0.25-0.19)/0.03 = 2.0
P (z  2.0) = 0.02275

The probability that p  0.25 is 0.02275.


References
1. Biostatistics lecture note for Health Science students
(Carter center)
2. Daniel Biostatistics / Wayne W. Daniel. Biostatistics:
Basic Concepts and Methodology for the Health
Sciences. 9th ed. John Wiley and Sons (Asia) Pte
Ltd, 2010.

You might also like