0% found this document useful (0 votes)
3 views74 pages

Sample Survey

The document outlines the principles of sample survey theory, including definitions of population, sampling frame, and various data collection methods. It discusses advantages and disadvantages of sampling methods, types of sampling (probability and non-probability), and issues related to bias and sampling error. Additionally, it provides detailed explanations of simple random sampling and stratified random sampling, including formulas for calculating standard deviation, standard error, and confidence intervals.

Uploaded by

samdke10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views74 pages

Sample Survey

The document outlines the principles of sample survey theory, including definitions of population, sampling frame, and various data collection methods. It discusses advantages and disadvantages of sampling methods, types of sampling (probability and non-probability), and issues related to bias and sampling error. Additionally, it provides detailed explanations of simple random sampling and stratified random sampling, including formulas for calculating standard deviation, standard error, and confidence intervals.

Uploaded by

samdke10
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

UNIVERSITY OF MINES AND TECHNOLOGY (UMaT)

DEPARTMENT OF MATHEMATICAL SCIENCES

SAMPLE SURVEY THEORY


Questionnaire Research Flow Chart

2
Sample Survey Theory
• A population is defined as including all people or items with the characteristic one wishes to
understand.

• A sampling frame is a list of common characters of a sample in a population.

• There are four main methods of data collection namely:

i. Census

ii. Sample Surveys

iii. Experiment

iv. Observational Study

• Advantages & Disadvantages Of Method Of Data

1. Resources – Sample survey is advantageous to census.

3
Sample Survey Theory Cont’d
2. Generalizability – the appropriateness of applying findings from a study to a larger
population.

• Sampling Methods refers to the way that observations are selected from a population to be in
the sample for a sample survey.

• Population Parameter is the true value of a population.

• Sample Statistic is an estimate, based on sample data of a population parameter.

• Probability Samples: each population element has a known chance of being chosen for sample
random selection.

• Non-Probability Samples: we don’t know the probability that each population element will be
chosen.

4
Sample Survey Theory Cont’d
• Below are some of the types of non-probability sampling method

i. Voluntary sample: self selected and willingness

ii. Convenience sample: made up of people who are easy to reach.

• Probability Sampling Method

i. Simple random sampling

ii. Stratified sampling

iii. Cluster sampling

iv. Multistage sampling

v. Systematic random sampling

5
Sample Survey Theory Cont’d
• Bias is the tendency of a sample statistic to systematically over or under estimate a population
parameter. It often occurs when the survey sample does not accurately represents the
population.

• The bias that results from an unrepresentative sample is called selection bias. Examples include
under-coverage, non-response bias and voluntary response bias.

• Random Sampling is a procedure for sampling from a population in which the selection of a
sample unit is based on chance and every element of the population has equal chance of being
selected.

• Response Bias refers to the bias that results from problems in the measurement process.

• Sampling Error is the variability among statistics from different samples.

6
Sample Survey Theory Cont’d
• Sampling refers to the process of choosing a sample of element from a total population of
elements.

• Probability Sampling: every element of the population has a known possibility of being
included in the sample.

• Advantages of Probability Sampling

i. Can study biases of the sampling plans

• Non-Probability Sampling: we cannot specify the possibility that each element will be included
in the sample.

• Advantages of Non-Probability Sampling

i. It is convenient

7
Sample Survey Theory Cont’d
• Disadvantages of Non-Probability Sampling

i. Unable to access bias in any rational way

ii. No estimate of predictions can be obtained

iii. Experts may disagree with concepts

• Quality of Survey Results

i. Accuracy

ii. Precision

iii. Margin of error

• A sample design can be described by the two factors namely Sampling method and Estimator.

8
Simple Random Sampling
• Properties

i. The population consists of N-objects

ii. The sample consists of n objects

iii. If all possible samples of n objects are equally likely to occur then the sampling method is
called simple random sampling.

9
Simple Random Sampling Cont’d

2
S.D = ------ with replacement S2
n
 n   N  2 S2
S.D = 1- .  S.E =
 N  N -1 n n

 n  2
S.E =  1 - 
 N  n
Proportion
P 1 - P  N - n  . P  1 - P 
S.D = =
n N -1 n
P 1 - P 
S.E =
n- 1

S.E – this is an estimate of the standard deviation of the sampling distribution.


10
Simple Random Sampling Cont’d
• Population Variation (Known)

Known (with replacement)

2
S.D =
n
Known (without replacement)

 n   N  2
S.D = 1- .
 N   N - 1  n
S.E – this is an estimate of the standard deviation of the sampling distribution.

11
Simple Random Sampling Cont’d
• Estimated Population Variation

With replacement

S2
S.E =
n
Without replacement

 n  S2
S.E = 1- 
 N  n

12
Simple Random Sampling Cont’d
• Population Proportion (Known)

Known (with replacement)

P 1 - P 
S.D =
n
Known (without replacement)

S.D =
N - n  . P 1 - P 
N - 1  n

13
Simple Random Sampling Cont’d
• Population Proportion (Estimated)

Estimated (with replacement)

P 1 - P 
S.E =
n - 1 
Estimated (without replacement)

 n  P 1 - P 
S.E =  1 - .
 N  n - 1 

14
Simple Random Sampling Cont’d
• Note

S 2
=
  x- x
n - 1 

Margin of Error (ME) = Critical Value * Standard Error

Confidence Interval = Sample Statistic ± Margin of Error

15
Simple Random Sampling Cont’d
• Questions

1. At the end of every school year, the state administers a reading test to a simple random
sample drawn without replacement from a population of 20,000 third graders. This year,
the test was administered to 36 students selected via simple random sampling. The test
score from each sampled student is as shown below:

50, 55, 60, 62, 62, 65, 67, 67, 70, 70, 70, 70, 72, 72, 73, 73, 75, 75, 75, 78, 78, 78, 80, 80, 80,
82, 82, 85, 85, 88, 88, 90, 90, 90.

Using the sample data estimate the main reading achievement level in the population.
Find the margin of error and the confidence interval. Assume a 95% confidence level.

16
Simple Random Sampling Cont’d
• Solution
 x 50 + 55 + 60 + 62 + 62 +  + 90
Sam ple M ean x =   n i
=
36
2700
Sam ple M ean x =   36
= 75

Hence the mean reading achievement level in the population is equal to 75.

Find the standard error (S.E)

Finding the variance (estimated variance)


2

S2 =
 x - x 
n - 1 
2 2 2 2 2 2

S2 =
 80 - 75  +  55 - 75  +  60 - 75  +  +  90 - 75  +  90 - 75  +  90 - 75 
 36 - 1 
17
Simple Random Sampling Cont’d

1747 + 162 + 1390 3299


S2 = =  94.26
35 35

 n  S2
S.E = 1- 
 N  n
 36  94.26
=  1 -  36  1.6166
 20, 000 

Finding the critical value

 = 1 - C onfidence Level  1 - 0.95 = 0.05



C ritical Probability  1 -
2
0.05
 1- = 0.975
2
18
Simple Random Sampling Cont’d
Hence from the normal distribution table, thus 2 interval, the critical value for 0.975 is 1.96.

Margin of Error (ME) = Critical Value * Standard Error


= 1.96  1.6166  3.1685

Confidence Interval = Sample Statistic ± Margin of Error


= 75  3.1685 = 78.168, 71.8315

Hence, the confidence interval is 71.8315 to 78.168 at a confidence level of 95%.

19
Stratified Random Sampling
• Properties
1. The population consist of N elements.
2. The population is divided into H group called strata.
3. Each element of the population can be assigned to one and only one, stratum.
4. The number of observations within each stratum Nh is known.

• Advantages
1. Provide greater precision than a simple random sample of the same size.
2. It requires a smaller sample which saves money.
3. It guard against an unrepresentative sample.
4. An assurance of obtaining sufficient sample point to support a separate analysis of any
subgroup.

20
Stratified Random Sampling Cont’d
• Disadvantages

1. It may require more administrative effort than a simple random sample.

• Proportionate Stratification: the sample size of each stratum is proportionate to the


population size of the stratum. Thus each stratum has the same sampling fraction.

• Properties of Proportionate Stratification

1. It provides equal or better precision than a simple random sample of the same size.

2. Gains in precision are greatest when values within strata are homogeneous.

3. It accounts to all survey measures.

• Disproportionate Stratification: the sampling fraction may vary from one stratum to the next.

21
Stratified Random Sampling Cont’d
• Properties of Disproportionate Stratification

1. Precisions may be very good or very poor, depending on how sample points are allocated
to strata.

2. If variances differ across strata, disproportionate stratification can provide better precision
than proportionate stratification, when sample points are correctly allocated to strata.

3. It researches can maximize precision for a single important survey measure.

Nh 
S a m p le e stim a te o f m e a n =    Xh
N 
N 
S a m p le e stim a te o f p ro p o rtio n =   h   P h
 N 
Nh
is the sample fraction.
N

22
Stratified Random Sampling Cont’d
• With replacement (population variance known)

 1   2  h2 
S.D =    Nh  n 
N   h 

• Without replacement (population variance known)

1  N h3   n h   h2
S.D =    N - 1 1- N  n
N  h   h  h

23
Stratified Random Sampling Cont’d
• With replacement (population variation estimated)

 1   2 S h2 
S.E =    Nh  n 
N   h 

• Without replacement (population variation estimated)

1  2  n h  S h2 
S.E =   N h   1 -  
N   N h 
nh 

24
Stratified Random Sampling Cont’d
• With Replacement (Population Proportion Known)

 1   2 Ph  1 - Ph  
S.D =     N h  
N  nh 

• With Replacement (Population Proportion Estimated)

 1   2 Ph  1 - Ph  
S.E =     N h  
N  n h - 1 

• Without Replacement (Population Proportion Known)

 1    N h3   n h  Ph  1 - Ph  
S.D =     1 -  
N  N
  h - 1   N h 
n h 

25
Stratified Random Sampling Cont’d
• Without Replacement (Population Proportion Estimated)

 1   2  n h  Ph  1 - Ph  
S.E =     N
 h  1 -  
N   N h 
n h
- 1 

• Questions
2. At the end of every school year, the state administers a reading test to a simple random
sample drawn without replacement from a population of 20,000 third graders. This year, a
proportionate stratified sampling was used to select 36 students for testing. Because the
population is half boy and half girl one stratum consisted of 18 boys and the other, 18 girls.
Test scores from each sampled student
Boys: 50, 55, 60, 62, 62, 65, 67, 67, 70, 70, 73, 73, 75, 78, 78, 80, 85, 90.
Girls: 70, 70, 72, 72, 75, 75, 78, 78, 80, 80, 82, 82, 85, 85, 88, 88, 90, 90.

26
Stratified Random Sampling Cont’d
Using the sample data, estimate the mean reading achievement level in the population.
Find the margin of error and the confidence interval. Assume 95% confidence level.

xb =
x i
=
50 + 55 + 60 + 62 + 62 +  + 90
n 18
1260
xb = = 70
18

70 + 70 + 72 + 72 +  + 90
xg =
18

1440
xb = = 80
18
The stratum mean = stratum mean for boys + stratum mean for girls
N  10000 10000
The stratum mean, x =   h   x h =  70  +  80  = 75
 N  20000 20000
27
Stratified Random Sampling Cont’d
Finding the margin of error , we need to find the standard error

Find the estimated variance for each


2

S h2 =
 x - xh 
n - 1 
2 2 2 2
2  50 - 70  +  55 - 70  +  60 - 70  +  +  90 - 70 
S hb = = 105.41
 18 - 1 
2 2 2 2
2  70 - 80  +  70 - 80  +  72 - 80  +  +  90 - 80 
S hg = = 45.41
 18 - 1 
1  2  n h  S h2 
S.E =   N h   1 -  
N   N h 
nh 

28
Stratified Random Sampling Cont’d

1   18  105.41    18  45.41 
S.E =   100002  1 -   + 100002 1 -
 10000  18 
20000   10000  18     

1
S.E =  584557011.1 + 257823677.8
20000

1
S.E =  836380688.9  1.45
20000
Hence the standard error of the sampling distribution of the mean is 6.45

Critical alpha,  = 1 - C onfidence Level  1 - 0.95 = 0.05



C ritical Probability  1 -
2
0.05
 1- = 0.975
2

29
Simple Random Sampling Cont’d
From the z-distribution (normal distribution) the critical value is 1.96.

Finding the margin of error

Margin of Error (ME) = Critical Value * Standard Error

= 1.96  1.45  2.842

Confidence Interval = Sample Statistic ± Margin of Error

= 75  2.842 = 72.158, 77.843

Hence, the confidence interval is 72.158 to 77.843 at a confidence level of 95%.

30
Cluster Sampling
• Properties

1. The population is divided into N groups called clusters.

2. The researcher randomly selects n clusters to include in the sample.

3. The number of observations within each cluster Mn is known.

4. Each element of the population can be assigned to one and only one cluster.

• Types of Cluster Sampling Methods

1. One-stage sampling

2. Two-stage sampling

• Disadvantage of Cluster Sampling

1. Provides less precision than both simple and stratified sampling.

31
Cluster Sampling Cont’d
• Advantage of Cluster Sampling

1. The cost per sample point is less than other sampling.

2. When the increased sample size is sufficient to offset the less in precision, cluster may be
the best choice.

3. It is used only when it is economically justified.

• Difference Between Strata and Clusters

1. All strata are represented in the sample but in a subsets of cluster are in the sample.

2. With stratified the best survey results occur when elements within strata are internally
homogeneous but with cluster the best survey results occur when elements within clusters
are internally heterogeneous.

32
Cluster Sampling Cont’d
• Formulae for Cluster Sampling

Population parameter formula for sample estimated

 N 
Mean =     M i  xi
 nM 
 N 
Proportional Mean =     M i  Pi
 n M 

The variability of the estimate is measured by the standard error.

Standard Error of Mean Score for One-Stage Method


2
 n   t mean 
 1 -   iM  x i - 
 1  2  N   N 
  N  
M n n-1

33
Cluster Sampling Cont’d
Standard Error of Mean Score for Two-Stage Method
2
 n   t 
 1-    M i  xi - mean 
 1  2  N   N  N    mi  2 Si2 
  N         1 -  Mi  
M n n-1 n    M i  m i 

Note
N Mi
tmean    ti , ti 
n mi
For all proportion stages xi must be Pi and tmean = tprop

mi = (mi – 1)

34
Cluster Sampling Cont’d
• Questions

3. At the end of every school year, the state administers a reading test to a sample third
graders. The school system has 20,000 third graders grouped in 1000 separate classes.
Assume that each class has 20 students. This year, the test was administered to each
student in 36 randomly sampled classes. Thus, this is one-stage cluster sampling, with
classes serving as clusters. The average test score from each sample cluster is given as

55, 60, 65, 67, 67, 70, 70, 70, 72, 72, 72, 72, 73, 75, 75, 75, 75, 75, 77, 77, 78, 78, 78, 78, 80,
80, 80, 80, 80, 80, 83, 83, 85, 85.

Using the sample data estimate the main reading achievement level in the population.
Find the margin of error and the confidence interval. Assume a 95% confidence level.

35
Cluster Sampling Cont’d
• Solution
mi = 20, N = 1000, n = 36, M = 20000, M i = 20

 N   1000   xi
Sample Mean =   M
  i  x i =  36 20000  
 20  xi =
 nM    36

x i = 55 + 60 + 65 + 67 +  + 83 + 85 + 85 + 85 = 2700

Since it is a one-stage sampling then the standard error is given by


2
 n   t 
 1-    M i  xi - mean 
 1  2  N   N  N N Mi
M N   , Column tmean     M i  ti  , tmean    ti , ti 
  n n-1 n n mi

36
Cluster Sampling Cont’d
• Solution

N  M  
tmean     i  x
 ij 
n  mi  

1000  20  
     x
 ij 
36  20  

 27.778  20   xi

 27.778  20   55 +60+65 ++ 85 + 85 + 85 

 27.778  20  2700 = 1500012


2
 36   1500012 
 1 -  20  x -
1000   1000 
i
 1  2  
S.E =   1000  
 20000  36 36 - 1
37
Cluster Sampling Cont’d
2 2
 36    1500012   1500012  
 1 -  20  55 - 20  85 - 
 1  2  1000    1000  
 1000  
S.E =   1000    
 20000  36  35 35 
 
 

 36 
1 -
 1000 
 1 
S.E =   1000  
2   18217.143 = 1.1
 20000  36

M.E = Critical Value * Standard Error = 1.96*1.1 = 2.156

Confidence Interval = Sample Statistic ± Margin of Error

= 75 ± 2.156

Hence, the confidence interval is 72.84 to 77.16 at 95% confidence level.

38
Cluster Sampling
• 6 Factors That Influence Sample Size

1. Cost considerations

2. Administrative concerns

3. Minimum acceptable level of precision

4. Confidence level

5. Sampling method

6. Variability within the population/ subpopulation

39
Cluster Sampling
• Questions

4. At the end of every school year, the state administers a reading test to a sample of 36
third graders. The school system has 20,000 third graders, half boys and half girls. The
results of last year’s test are shown in the table below

Stratum Mean Score Standard Deviation

Boys 70 10.27

Girls 80 6.66

This year, the research plan to is to use a stratified sample, with one stratum consisting of
boys and girls. Use the above information to find

40
Cluster Sampling
a) Maximize precision, how many sampled students should be boys and that of girls.

b) What is the mean reading achievement level in the population?

c) Compute the confidence interval.

d) Find the margin of error (Assume a 95% confidence level).

Solution
n  Nh  h 
N b = 10000, N g = 10000 [i.e. half boys and half girls], nh =
 N  
i i

36   10000  10.27 
nb = = 21.825  22
 10000  10.27    10000  6.67 
36   10000  6.67 
ng = = 14.17  14
 10000  10.27    10000  6.67 
41
Cluster Sampling
Hence the number of boys needed is 22 and that of girls is 14.

ii. Finding the mean in the population

N 
x =   h   xh
N 

10000 10000
x=
20000
 70  
20000
 80  = 75

iii. Finding the confidence interval

We need to find the standard error and find the margin of error

Confidence Interval = Statistic Sample Mean ± Margin of Error

M.E = Critical Value * Standard Error

42
Cluster Sampling
Critical value is obtained by finding the critical

Confidence level = 95% = 0.95, α = 1 - 0.95 = 0.05



C ritical Probability  1 -
2
0.05
 1- = 0.975
2
Critical value at 95% confidence level is given from the z-distribution as 1.96

1  2  n h  S h2 
S.E =   N h   1 -  
N   N h 
nh 

 1  2  22  10.27 2 2  14  6.67 2
S.E =   1000   1 -    1000   1 -  
 20000   1000  22  1000  14

43
Cluster Sampling Cont’d
 1 
S.E =   478367543.7 + 317332968.1
 20000 

 1 
S.E =   795700511.8 = 1.41
 20000 

M.E = Critical Value * Standard Error = 1.96*1.41 = 2.7636

Confidence Interval = Sample Statistic ± Margin of Error

= 75 ± 2.76

Hence, the confidence interval is 72.24 to 77.76 at 95% confidence level.

44
Best Sampling Method
• Best Sampling Method: is the sampling method that must effectively meet the particular goals
of the study in question.

• How to Choose the Best Sampling Method

1. List the research goals

2. Identify the potential sampling methods that might effectively achieve the goal

3. Test the ability of each method to achieve the goal

4. Choose the best method that does the best job.

45
Ratio Estimation
• Ratio Estimation

In Simple Random Sampling, if X and Y are positively correlated, we can use ratio estimation
to give more reliable estimates of the population.
n
Ty =  yi  Total of the y-values in the population
i =1

n
Tx =  xi  Total of the x-values in the population
i =1

Ty Yu Y
B= = =
Tx Xu X

Where B is the ratio of the population mean of y to the population mean of x.

R=
n  x - x  y - y 
i u i u
: the population correlation of X and Y
i =1  N - 1  S x Sy
46
Ratio Estimation Cont’d
The estimators for B, Ty and y u are
y Y
T

B= =

Tx X
 yr = BT
T 
x

u = B
Y Xu

Tx
Xu =
n
Variance of the Ratio Estimator
 - B    1 - n  1 BS 2 - RS S
2  X Y 
Bias: E  B
   N  X
nX i

47
Ratio Estimation Cont’d
Variance Known (B)
 n 
2 

 B
 = 1-
 n  1 
  Yi  BX i  
V    
 N  nX u 
2
 i =1
n-1


 
 
 
V  
ar Y  
r = V X u B 
 
ar T
V  
 
yr = V Tx B 
 
If sample size n are sufficiently large, the 95% confidence intervals are
  1.96S B   kS B
B e   i.e. B e 
   
 
Y r  1.96Se Y r  
i.e. Y r  kSe Y r

 yr  1.96S T yr  yr  kS T yr
T e  i.e. T e  
48
Ratio Estimation Cont’d
• Why Use Ratio Estimation

1. We need to estimate a ratio.

2. We want to estimate a population total but the population size N is unknown. So the
estimator T Y  NY can’t be used.
T
Hence we can estimate N by X .
X
3. Ratio estimation increases the precision of estimated means and totals.

4. Ratio estimation is used to adjust estimates from the sample so that they reflect the
demographic totals.

5. Ratio estimation is used to adjust for non-response.

49
Ratio Estimation Cont’d
• Questions

Suppose the population consist of N = 576 village of different sizes. Let:

Yi = Tons of grams harvested in village i

Xi = Hectares of village I

n = 24, TX = 21875.6

B = average yields in tons per ******

= average yields in tons per village

Ty = total yields in tons

50
Ratio Estimation Cont’d

Village i Yield (t) Y ha. X Village i Yield (t) Y ha. X


1 112 30.2 13 105.7 30.8
2 129.1 36.1 14 80.5 21.7
3 208.2 60.8 15 163 49.2
4 158.5 44.4 16 98.7 28
5 110.2 29.8 17 137.8 37.8
6 123.3 34.9 18 141.2 38.6
7 157.7 41.6 19 152.5 42.8
8 154.2 42.8 20 142.5 39
9 98.7 25.8 21 136.7 37.6
10 112.7 34.7 22 153.2 43.2
11 125.5 35.1 23 93 26.1
12 60.3 15.8 24 179.8 48.3

51
Ratio Estimation Cont’d
n  24, y i
 3135

y
i 1
i
3135
Y    130.625
n 24
n

x
i1
i
875.1
X    36.4625
n 24

The ratio estimator for the average yield

Y 13 0 .6 2 5
B    3 .5 8 2 4 4 7 7 2
X 3 6 .4 6 2 5

y  T B
T   2 1 8 7 5 .6  3 .5 8 2 4 4 7 7 2  7 8 3 6 8 .2
x

52
Ratio Estimation Cont’d
Finding the Variance of B
2

  1  n  1
 B  yi  Bxi 
V    N  n xu 2
n  1 
xu  Mean of the Population
  
2
   1  24 
 B 1  y  3.582xi 
V   
 576   21875.6 
2
i

23
24  
 576 

53
Ratio Estimation Cont’d

Village − Village −

1 15.0854 13 20.830
2 0.019044 14 7.918596
3 89.567 15 172.5544
4 11.916 16 2.3716
5 12.362 17 6.130576
6 2.696164 18 9.072144
7 76.94798 19 0.524176
8 0.9826 20 8.2944
9 40.144816 21 4.376464
10 132.8486 22 2.119936
11 0.024964 23 0.191844
12 13.95769 24 47.116996
54
Ratio Estimation Cont’d
Finding the Variance of B

   1  24 
 B 1 678.3235
V   
 576   21875.6 
2
23
24  
 576 

 0.9583  2.88878  105  29.49232609

 B
  8.1644  104
V    B
V   8.1644  104  0.028
 
 T
y  T2 V
 B
  21875.62  8.1644  104  390798.35
V   x  
 y  N  Y  576  130.625  75240
T

 T
 y  N 2  V Y  5762  44.41040943  147034308
V    
55
Ratio Estimation Cont’d
100 100
Question: n  100, N  1000, y i
 1750, x i
 1200, t x  12500
i 1 i 1

100 100 100


2 2
y
i 1
i
 31650, x
i 1
i
 15620, xy
i 1
i i
 22059.35

y=
y i
=
1750
= 17.5 x=
x i
=
1200
= 12
n 100 n 100

 = y = 17.5 = 1.4583
B
x 12

tx 12500
Xu = = = 12.5
N 1000

 Xu = 1.4583 12.5  18.22875


Yr = B  
56
Ratio Estimation Cont’d
  n  Se2

 
Var Y r =  1 
N  n
n
2
  yi  Bxi 
Se2 = i 1

n1
n
2 2 2 x2
 xy  B
  yi  Bxi  =
i1
y
 i  2 B  ii  i
n
2 2
 y
i 1
i
 Bxi  = 31680  2  1.4583  22059.35    1.4583   18620   529.7992518
2

Se2 =
 y i
 Bxi 

529.7992518
 5.352
n1 100  1

 n  Se2  100  5.352


Se y r =   1  N  n 
 


1 
1000  100  0.219

57
Ratio Estimation Cont’d
95% confidence for the errors of estimation

 
 
y r  kSe y r

y r  1.96  0.219 

y r  0.42924  18.22875  0.42924

Selecting the Sample Size

Sample size for estimating R with bound of error B

N 2 B2Ux2
n , D
ND   2 4

58
Ratio Estimation Cont’d
Question: Problem of estimating the ratio of change from last year to this year in the numbers of
workers hour due to sickness. Pilot study of n=10. company recorded total worker-hours lost for
last year was tx =16300. Determine the sample size to estimate R, the rate of change for the
company with bound of error B=0.01. Assume total number of worker is N=1000.

Solution

 y i 187
r= = = 1.05
x i 178
2

2 =   yi - rxi  = 1.86 2 = 3.4596  3.46


n-1

t x 16300
Ux = = = 16.3
N 1000

59
Ratio Estimation Cont’d
2 2
B 2 U x  0.01 2  16.3 
D = = = 6.642  10 -3
4 4

N 2 1000  3.4596 
n= =
ND   2 
1000 6.642  10 3  3.4596 
3459.6
n=  342.48
10.1016

Therefore n=343 is the sample size

Sample Size for Estimating Uy

N 2 B2
n= 2
, D 
ND   4

60
Ratio Estimation Cont’d
Question: N=1000-acre. Wish to estimate the average number of trees Uy Determine the sample
size necessary to estimate Uy , B=10.

Solution

r=
y i
=
221
= 1.06
x i 208
2

2 =
 y i
- rxi 
= 4.20
9
B2 1.0 2
D = =  0.25
4 4

N 2 1000  4.20 4200


n= = =  17
ND +  2 1000  0.25  4.2 254.2

61
Ratio Estimation Cont’d
Sample Size for Estimating Uy
N 2 B2
n= , D 
ND   2 4N 2

Question: t x  4500, n  15, N  2100, B  500

Determine the sample size needed to estimate ty

y 1.583
r= = = 0.9814  0.98
x 16.13
2

2 =
 y i
- rxi 
= 2.73 2  7.45
n1
2

2 =
  y i
- rxi 
= 2.73 2  7.45
n1

62
Ratio Estimation Cont’d
N 2
n=
ND   2

2100  7.45 
n=
2100  0.01417   7.45

15645
=  420.485  421
37.207

63
Regression Estimation
• Ratio estimate is most appropriate when relation between X and Y is linear through the origin
if otherwise one uses regression estimator.

• Using the Least Square Method

y i = a + bxi and a = y - bx

y i = y + b xi - x
 
• Regression Estimator of a Population Mean Uy
y = y + b U - x
U x   where

b=
  y - y  x - x 
i i
2
 x - x  i

64
Regression Estimation Cont’d
   N -n  1   N N 2
 
V U yi =     i
 Nn   n - 2   i =1

y - y xi
- x - b 
2

i =1
xi- x 

 yi =  N - n  MSE
 U
V    Nn 
 

 1  N N 2
MSE =    i 
 n - 2   i =1
y - y
xi - x -b 2

i =1
xi - x 

Question: Mathematics achievement test given to 486 student prior to entering a college. A
simple random sample n=10 was selected from these students and their final calculus grades , y
were observed. If Ux (mean of math test) = 52. Estimate Uy and place in the error of estimation if
MSE=75.8

65
Regression Estimation Cont’d
Solution
 y = y + b x - x = 76 + 0.766  52 - 46  = 80.6
U  i 
 yi =  N - n  MSE = 486 - 10  75.8 = 7.424
 U
V    Nn 
  486  10 

 U
 yi
Margin of Error = 2 V  
Margin of Error = 2 7.4240 = 5.45

66
Errors in Surveys
• Sampling errors arise solely as a result of drawing a probability sample rather than conducting a
complete enumeration.

• Non-sampling errors are mainly associated to data collection and processing procedures.

• Total Survey Error = Sampling error + Non-Sampling Error

TOTAL ERROR

SAMPLING ERROR NON-SAMPLING ERROR

VARIABLE ERROR SYSTEMATIC ERROR

67
Errors in Surveys Cont’d
• Why Do Errors Occur

1. Because researchers draw different subjects from the same population

• Causes of Error

1. Biased in sampling process/ procedure.

2. Chance (randomization and probability).

3. Systematic error.

• Standard deviation is used to express the variability of the population.

• Ways of Eliminating Sampling Errors

1. To eliminate the concept of sample and to test the entire population

- By proper and unbiased probability sampling and by using a large sample size.

68
Errors in Surveys Cont’d
• Sources of Non-Sampling Error

1. Definition to be used

2. Methods of data collection

3. Measurements to be made

• Factors That Causes Non-sampling Errors

1. Data specification being inadequate

2. Duplication or omission of units

3. Inappropriate methods of interview

4. Lack of training and experience

5. Inadequate scrutiny of the basic data

69
Errors in Surveys Cont’d
• Types of Non-Sampling Error

1. Specification error

2. Coverage/ frame error

3. Gross coverage error

4. Net non-coverage

• Reducing Coverage Error

- By improving the frame by excluding erroneous units and duplicates and updating the frame
through field work to identify units missing from the frame.

• Non-Response: to the failure to measure some of the sample units. Failure to obtain
observation on some units selected for the sample.

70
Errors in Surveys Cont’d
• Types of Non-Respondents

1. Not-****-homes

2. Refusals

3. Not identifiable respondents

• Causes of Non-Respondents

1. Lack of motivation

2. Shortage of time

3. Sensitiveness of the study to contain questions

71
Errors in Surveys Cont’d
• Ways of Reducing Non-Respondents

1. Good frames

2. Interview training, selection and supervision

3. Follow up of non-responding units

• Measurements of Error: These errors arise from the fact that what is observed or measured
departs from the actual values of sample units, like recording, coding.

• Reasons for Such Errors

1. Inadequate supervision of enumerators.

2. Inadequately trained and experienced field staff.

3. Problems involved in data collection.

72
Errors in Surveys Cont’d
• Processing Errors

1. Editing errors 3. Data entry errors

2. Coding errors 4. Programming errors

• Errors of Estimation: Arise in the process of extrapolation of results from the observed sample
units to the entire target population. Errors include coverage, sample selection bias and variable
error.

• Types of Survey Errors (Main kind)

1. Survey biases due to definitions, measurements and responses.

2. Sampling variable errors.

73
Errors in Surveys Cont’d
• Bias refer to systematic errors that affect any sample taken under a specified survey design with
the same constant error.

• Variable error occurs as a result of the failure to constantly apply survey and census.

• Mean Square Error (MSE) = Variance + (Bias)2

• Assessing Non-Sampling Error

1. Consistency check

2. Sample check

3. Survey check

74

You might also like