0% found this document useful (0 votes)
2 views

stat2602_chapter4

The document discusses interval estimation in statistics, emphasizing the concept of confidence intervals (C.I.) which provide a range of values within which the population mean is likely to fall. It outlines the general procedure for finding confidence intervals, including the use of pivot variables and examples involving exponential and normal distributions. Additionally, it highlights the importance of sample size and confidence levels in determining the accuracy and width of confidence intervals.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

stat2602_chapter4

The document discusses interval estimation in statistics, emphasizing the concept of confidence intervals (C.I.) which provide a range of values within which the population mean is likely to fall. It outlines the general procedure for finding confidence intervals, including the use of pivot variables and examples involving exponential and normal distributions. Additionally, it highlights the importance of sample size and confidence levels in determining the accuracy and width of confidence intervals.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Stat2602 Probability and Statistics II Fall 2014-2015

Chapter IV Interval Estimation

§ 4.1 Introduction

We can seldom estimate “correctly” without error. It is almost impossible to have a


sample mean exactly equal to the population mean. Then, how can we justify our
estimate? Can we know how close the estimate is? What is our confidence of the
closeness ?

The answer is: we can specify a “margin of error” for the point estimate from the
sample, which, when added to and subtracted from the sample result, gives us a
range of values. We use the whole range to estimate the population mean. Then we
can justify this range by considering our confidence that this range contains the
actual population mean. Such a range is called a confidence interval.

Definition
Let X   X 1 , X 2 ,..., X n   be a random vector of a statistical model with
parameter vector θ   . Let  θ  denote a specific parameter, LX  and U X 
denote two statistics that satisfy

P LX    θ   U X   1   for all θ   ,

where 1   is a high probability. Then the interval LX  , U X  is called a


1001   % confidence interval (C.I.) for the parameter  θ  . The quantity 1  
is called the confidence level associated with the confidence interval.

Example 4.1

Suppose that X ~ Exponential   . Then based on the cdf of exponential


distribution, we have
 
 3
3
  
P 0  X    1  e     1  e 3  0.9502
 

 3
which is equivalent to P 0      0.9502 . Note that the equation holds for all
 X
values of   0,   . Therefore 0 , 3 X  is a 95.02% confidence interval for  .

P.78
Stat2602 Probability and Statistics II Fall 2014-2015

General Procedure of Finding the Confidence Interval


1. Find a random variable V X, θ  which is a function of X and θ such that the
distribution of V X, θ  does not depend on θ . The random variable V X, θ  is
called the pivot variable. Note that a pivot variable is not a statistic.

2. Based on the distribution of V X, θ  , find constants c1 ,c2 such that

P c1  V X, θ   c2   1   for all θ   .

3. Solve the inequality inside the probability statement to obtain

P LX    θ   U X   1   for all θ   .

The interval given by LX ,U X  is then a 1001   % C.I. for  θ  .

Example 4.2

Suppose historical data suggests that the lifetime of a particular electronic device
follows an exponential distribution with unknown parameter. Suppose that we
observed a random sample of size n, i.e. X 1 , X 2 ,..., X n ~ Exponentia l   . From
iid

Example 2.1, the sampling distribution of the sample mean was determined as

X ~ Gamman, n  .

To construct a confidence interval for  , we can use the pivot variable

 1
2nX ~ Gamma  n,    22n ,
 2

from which we can have the following probability statement

P  22n ,1 2  2nX   22n , 2   1   for all   0,   .

Solving the inequality results in the 1001   % confidence interval for  :

 12 2, 2 n  2 2, 2 n 
 , .
 2 nX 2 nX 

P.79
Stat2602 Probability and Statistics II Fall 2014-2015

Example 4.3

Suppose that we have a random sample from the normal distribution,


X 1 , X 2 ,..., X n ~ N  , 2  with known variance  2 . From the sampling distribution
iid

of the sample mean:


 2 
X ~ N  ,  ,
 n 

we can easily obtain the pivot variable

X 
Z ~ N 0,1 .
 n

Using the percentiles of the standard normal distribution, we have the following
probability statement:

 X  
P  Z  2   Z  2   1   for all    ,   .
  n 

Solving the inequality gives

   
P X  Z  2    X  Z 2  1  for all    ,   .
 n n

   
Therefore  X  Z  2 , X  Z 2 is a 1001   % confidence interval for  .
 n n 

 
For simplicity, it is often denoted as X  Z 2 . The quantity Z 2
is called
n n
the margin of error, which essentially represents an upper bound of the estimation
error.

Note that if  is unknown (which is the usual situation), we may use the pivot
variable
X 
T ~ t n 1
S n

to construct the confidence interval, thereby resulting in the 1001   %


S
confidence interval for  as X  tn 1, 2 .
n

P.80
Stat2602 Probability and Statistics II Fall 2014-2015

Example 4.4

A paint manufacturer wants to determine the average drying time of a new interior
wall paint. The drying times (in minutes) for 12 test areas of equal size are
obtained below.

58, 60, 61, 61, 68, 64, 76, 62, 73, 75, 59, 79

From the data, X  66.33 , S  7.51 .

Assuming normal distribution of the drying time, a 95% C.I. for the mean drying
time is given by

7.51 7.51
66.33  t11, 0.025  66.33  2.201 
12 12

 66.33  4.77

 61.56, 71.10

Note that the range 61.56    71.10 is just a conjecture about the population
mean. We can’t claim that the true average drying time must be between 61.56
mins and 71.10 mins. We can only claim that we have high confidence (95%) that
the range can cover the true mean.

Remarks
1. (Important note) The confidence interval LX ,U X  is a random interval. It
depends on the particular sample observations. The interpretation of a
confidence interval is that when you obtain a lot of independent sets of random
sample and for each set of random sample you get one particular interval, then
1001   % of these intervals will contain the parameter  θ  . Hence the
confidence level 1001   % is the probability describing the behaviour of all
possible intervals rather than just the one we obtain. The confidence level
should represent the performance of the entire procedure. In example 4.4, it is
incorrect to write P 61.56    71.10  0.95 .

2. A shorter confidence interval is corresponding to a smaller margin of error and


thus is more informative. Based on the same sample, the confidence interval
can be made shorter if we use a lower confidence level. There is always a
trade-off between the confidence level and the information content.

P.81
Stat2602 Probability and Statistics II Fall 2014-2015

3. With a fixed confidence level, we can reduce the margin of error (hence
producing shorter confidence interval) by increasing the sample size.

4. The confidence interval is not unique. For any specific confidence level 1   ,
there are infinite pairs of constants c1 ,c2 satisfying P c1  V X, θ   c2   1   .
 S S 
In Example 4.4,  X  tn 1, 0.01 , X  tn 1, 0.04 is also a 95% confidence
 n n 
interval for  . For simplicity and for a shorter interval, a common practice is
to evenly allocate  2 as the tail areas to each side of c1 ,c2 . However, when
the parameter space has a finite boundary, we may allocate  to only one side.
  22n , 
In Example 4.2, 0 , is also a 1001   % confidence interval for  .
 2nX 

5. According to CLT, for arbitrary population distribution, the pivot variable

X 
Z
 n

is approximately distributed as normal when the sample size is large. Therefore


we can find approximate C.I. for the population mean  as


X  Z 2
n

for large sample, even when the population distribution is not normal. If  2 is
unknown, we can replace it by the sample variance S 2 because S 2 a consistent
estimator of  2 . Then the C.I. will be

S
X  Z 2 .
n

As a rule of thumb, n  30 is required for a good approximation.

P.82
Stat2602 Probability and Statistics II Fall 2014-2015

§ 4.2 Two Sample Comparisons

Suppose we have two populations of the same measurement which are distributed
as normal. We may want to compare the two population means or the population
variances, e.g. comparing the mean height of the people of two races. We can draw
random samples from these two populations independently and use these samples
to make inferences.

Suppose we have two independent samples :

~ N  ,  ; Y1 , Y2 ,..., Yn ~ N  y ,  y2 
iid iid
2
X 1 , X 2 ,..., X m x x

with  x ,  y ,  x2 , y2 as the unknown parameters. We may want to compare the two


population means by constructing a C.I. for  x   y , or compare the two
population variances by constructing a C.I. for  x2  y2 .

§ 4.2.1 Comparison of Population Means

Case 1: Both  x2 and  y2 are known

The maximum likelihood estimator for  x   y is X  Y . Base on this difference


between the two sample means we can obtain a C.I. for  x   y . From the
sampling distributions of X and Y and the fact that they are independent, we have

  x2  y 
2

X  Y  ~ N   x   y ,   .
 m n 

A pivot variable can be easily obtained as

X  Y     y 
Z x
~ N 0,1
 x2 m   y2 n

from which we can determine a 1001   % C.I. for  x   y :

 x2  y2
X  Y   Z  2 
m n

P.83
Stat2602 Probability and Statistics II Fall 2014-2015

Case 2: Both  x2 and  y2 are unknown

Usually, both  x2 and  y2 are unknown. In such case we may estimate them by the
sample variances and use the following pivot variable:

 X  Y     y 
T x
~ tv .
S x2 m  S y2 n

Note that this sampling distribution is not exact and is called the Satterthwaite’s
approximation, with the degrees of freedom v computed by

v
S 2
x
m  S y2 n 
2

.
S x4 S y4

m 2 m  1 n 2 n  1

The 1001   % C.I. for  x   y is

2
S x2 S y
X  Y   t v , 2 
m n

Remarks

1. Since the approximated degrees of freedom v is not an interger, t 2 ,v cannot be


obtained from the t-table directly. The values of t 2,v can be obtained by
statistical packages or by interpolation of the table values in the t-table.

2. The above approximated sampling distribution of T is based on the Cochran-


Satterthwaite approximation for the linear combinations of mean squares. The
detail mathematical derivation can be found in the supplementary notes.

P.84
Stat2602 Probability and Statistics II Fall 2014-2015

Example 4.5
The effectiveness of a mental training program was tested in a military training
program. In an anti-aircraft artillery examination, scores for an experimental group
and a control group were recorded. According to the following data, does the
mental training program make difference in their scores?

60.83 117.80 44.71 75.38 73.46 34.26 82.25 59.77


Experimental 69.95 21.37 59.78 92.72 72.14 57.29 64.05 44.09
80.03 76.59 74.27 66.87
122.80 118.43 121.70 104.06 119.89 118.58 111.26 138.27
Control 120.76 121.67 70.02 54.22 70.70 94.23 99.08 74.61

For experimental group, m  20 , X  66.38 , S x  20.96

For control group, n  16 , Y  103.14 , S y  23.94

Satterthwaite’s approximation:

v
20.96 2
20  23.94  16
2

 30
2

20.964  23.94 4
20 2 20  1 162 16  1

Then a 95% C.I. for  x   y is given by

66.38  103.14   t 20.96 2


23.94  2

 36.76  2.042  7.60


30 , 0.025
20 16
 36.76  15.52

  52.3 ,  21.2

We have 95% confidence that  x   y  21.2 . We may conclude that the mean
score of the experimental group is significantly lower than the mean score of the
control group by more than 21. Hence the mental training program would, on
average, result in a lower score in the anti-aircraft artillery examination.

P.85
Stat2602 Probability and Statistics II Fall 2014-2015

Case 3: Both  x2 and  y2 are unknown, with  x2   y2   2

A simpler confidence interval can be obtained if the population variances are


assumed to be equal. Consider E S x2    x2   2 , E S y2    y2   2 , i.e. both S x2
and S y2 are unbiased estimator of  2 . However, we can combine them for
estimating  2 so as to fully utilize the information. The “pooled” sample variance
is a weighted mean of the two sample variances

m  1S 2
 n  1S y2
S 2
 x

mn2
pool

and has the sampling distribution

m  n  2S 2
m  1S 2
n  1S 2

W pool
 x
 y
~  m2 n 2 .
 2
 2
 2

Since
 X  Y     y  X  Y     y 
Z x
 x
~ N 0,1 ,
2 2 1 1
  
m n m n

we can construct the following pivot variable:

Z X  Y    x   y 
T  ~ tm  n  2
W m  n  2  1 1
S pool 
m n

Therefore an 1001   % C.I. for  x   y can be obtained as

X  Y   t m  n  2 , 2 S pool
1 1

m n

P.86
Stat2602 Probability and Statistics II Fall 2014-2015

Example 4.6

A bank’s loan department found that 11 home loans processed during April had a
mean value of $78,100 and a standard deviation of $6300. An analysis of the 9
loans in May showed a mean value of $82,700 with a standard deviation of $7100.
Suppose these home loans represent independent random samples of the values of
home loan applications approved in the bank’s service area. Find a 98% confidence
interval for the increase in the mean level of approved home loan applications from
April to May.

Denote X as the home loan in April and Y as the home loan in May. From the data
we have
m  11 , X  78100 , S x  6300
n 9, Y  82700 , S y  7100

Assuming equal population variance, the pooled sample standard deviation is


calculated as

S pool 
11  1  63002  9  1  71002  6667
11  9  2

A 98% C.I. for  y   x is given by

1 1
82700  78100  t 18 , 0.01  6667    4600  2.552  2997
11 9
 4600  7648
  3048 , 12248

Since zero is a possible value in the interval, we don’t have sufficient evidence to
conclude that the mean level of approved home loan application had increased.

Remarks
1. Assuming equal population variances and then use the pooled sample variance
is a popular approach. However, there is no guarantee that the assumption must
be satisfied. It relies on our basic knowledge about the variables in the problem
under consideration. For instance, if we believe that the effect of a certain
treatment will be more or less the same on all experimental objects, then it may
be regarded as a shifting of the distribution without altering the variation. In
such case it would be reasonable to assume equal population variances on
treatment and control groups. Moreover, we may check the equal population
variance assumption by comparing the sample variances, as described in next
section.

P.87
Stat2602 Probability and Statistics II Fall 2014-2015

2. The above procedures for constructing confidence intervals were derived based
on the assumption that the population(s) is/are normal. If the normal assumption
is violated, the above procedures are still valid with large samples. For small
sample problems with non-normal population(s), we will need to use non-
parametric statistical methods, which are beyond the scope of this course.

3. Independence between the two samples is also a crucial assumption for the
above procedures. Usually it can be guaranteed by proper design on the
sampling or experimental procedures. However, in some typical experiments,
data are measured on same objects and the result cannot be regarded as
independence samples although the data looks like obtaining from two samples.

§ 4.2.2 Comparison of Population Variances

From Section 2.1.3 in Chapter 2, we have

S x2  x2
F ~ F m  1, n  1
S y2  y2

Based on this pivot variable, the following probability equation holds for all
parameter values:

 S x2  x2 

P Fm1,n1,1 2  2  F  1
Sy  y2 m 1, n 1, 2 
 

Hence an 1001   % C.I. for  x2  y2 is given by

 1 S x2 1 S x2 
 2
, 
 Fm1,n1, 2 S y Fm1,n 1,1 2 S y2 
or equivalently,

 1 S x2 S x2 
 2
, Fn1,m1, 2 2 
 Fm 1, n 1, 2 S y Sy 

1
as Fr ,r ,1  according to the property of the F-distribution.
1 2
Fr ,r ,
2 1

P.88
Stat2602 Probability and Statistics II Fall 2014-2015

Example 4.6

Consider the analysis in Example 4.6. A 90% C.I. for  x2  y2 is given by

 1 S x2 S x2   1 6300 6300 
2 2

 , F9 ,10, 0.05 2    2 , 3.02 


F S
 10, 9 , 0.05 y
2
S y   3 .14 7100  71002 

= 0.251 , 2.378

Since the interval contains 1, there is no evidence that the two population variances
are different.

§ 4.3 Estimation about population proportion

Many surveys have their objective as the estimation of the proportion of people or
objects in a large group that possess a particular attribute. As described in Section
2.2.4, we can use the sample proportion p̂ to estimate the population proportion p
and the sampling distribution of p̂ can be well approximated by normal if the
sample size n is large enough. Therefore the interval estimate of p can be
constructed based on the normal approximation.

Let X denote the number of objects in the sample possessing the interested
attribute, then X ~ bn, p  . When n is large, by Central Limit Theorem, we can use
the pivot variable
X  np .
~ N 0,1
np 1  p 

to obtain the approximated probability equation

 X  np 
P   Z  2   Z  2   1   for all p  0,1 .
 np 1  p  

Solving the quadratic inequality will give an approximate 1001   % C.I. for p:

2 pˆ  Z 2 2 n  2 pˆ  Z 2
 2 n   4 pˆ 2 1  Z 2 2 n 
2
X
where pˆ 
21  Z n
2
.
 2 n

P.89
Stat2602 Probability and Statistics II Fall 2014-2015

This expression is somewhat nasty. To obtain a simpler formula, we may drop the
term Z 2 2 n as it is negligible when n is large. As a result, the C.I. becomes

pˆ 1  pˆ 
pˆ  Z  2
n

To compare the population proportions p1 , p2 of two populations, we can construct


an approximate 1001   % C.I. for p1  p2 using similar derivations:

pˆ 1 1  pˆ 1  pˆ 2 1  pˆ 2 
 pˆ1  pˆ 2   Z  2 
n1 n2

where n1 ,n2 are the sample sizes of the samples from the two populations and
pˆ 1 , pˆ 2 are the corresponding sample proportions.

Example 4.7

It was found that 41 people in a random sample of 500 persons from the labour
force of a country were unemployed. Since the sample size n  500 is quite large, a
95% confidence interval for the rate of unemployment in the country can be
constructed as

pˆ 1  pˆ  0.0820.918
pˆ  Z 0.025  0.082  1.96
n 500
 0.082  0.024

 5.8%, 10.6%

Example 4.8

We want to compare the proportion of defective electric motors turned out by two
shifts of workers. From the large number of motors produced in a given week,
n1  50 motors were selected from the output of shift I and n2  40 motors were
selected from the output of shift II. The sample from shift I revealed 4 to be
defective, and the sample from shift II showed 6 faulty motors.

4 6
Sample proportions : pˆ 1   0.08 , pˆ 2   0.15
50 40

P.90
Stat2602 Probability and Statistics II Fall 2014-2015

A 95% confidence for p1  p2 is then given by

0.08  0.92 0.15  0.85


0.08  0.15  Z 0.025    0.07  1.96  0.0683
50 40
 0.07  0.134
  20.4% , 6.4%

Since the interval overlaps zero, zero cannot be ruled out as a plausible value of the
true difference between proportions of defective motors. Therefore there does not
appear to be any significant difference between the defective rates for the two
shifts.

Remarks

1. The above procedures rely on some approximations. First of all, the population
is assumed to be much larger than the sample so that it can be approximately
regarded as an infinite population. As a rule of thumb, the population size
should be at least ten times the sample size for an adequate approximation.

2. The second approximation is using normal distribution for the determination of


critical value rather than using binomial distribution. This approximation works
quite well when both np and n(1-p) are at least 5, as mentioned in Chapter 2.

3. In case when the sample size n is small, we can use the following formulae to
construct the one-sided C.I. for p.

  n
0, p̂    j  pˆ 1  pˆ 
X

j n j
U where U U
j 0  
n
 pˆ ,1   pˆ Lj 1  pˆ L   
n


n j
or L where
j X  j 

Note that p̂L and p̂U are functions of X and therefore are statistics. It can be
proved that (refer to the supplementary notes) they satisfy the following
probability inequalities:

P 0  p  pˆ U   1   for all p  0,1 ;

P  pˆ L  p  1  1   for all p  0,1 .

Since the confidence level may be larger than the required 1   , they are called
the conservative confidence intervals.
P.91
Stat2602 Probability and Statistics II Fall 2014-2015

§ 4.4 Sample Size Determination

A common and practical consideration before any actual survey takes place is :

“How large a sample should I take?”

No statistician can answer this question without knowing how accurate the survey-
taker wishes the estimate to be. Usually, we will try to minimize the sample size
subject to some precision requirement.

Consider the confidence interval for the population mean  :


X  Z 2
n

We have high confidence (1001   % ) that the difference between  and X is



within the margin of error Z  2 . Therefore if our precision requirement is that
n
we want the difference between the estimate and the true value is small, say, within
a certain limit D , then we should equate D with the margin of error and solve for
the sample size we need.

 Z 2 2 2
Solving D  Z 2 gives n
n D2

Similarly, if our interest is to estimate population proportion, then

p1  p  Z 2 2 p1  p 
solving D  Z 2 gives n
n D2

In general, the sample size can be determined by equating the precision


requirement D with the margin of error. When  2 or p is unknown, they are
estimated based on past survey results or pilot studies.

P.92
Stat2602 Probability and Statistics II Fall 2014-2015

Example 4.9

A marketing research firm wants to conduct a survey to estimate the average


amount spent on entertainment by each person visiting a popular resort. The people
who plan the survey would like to have an estimate close to the true value such that
with 95% confidence, the difference between them is within $120. From past
operation of the resort, an estimate of the population standard deviation is $400.
How large should the sample be?

Using the above formula,

Z 02.025 2  1.96  400 


2

n    42.68 .
D2  120 

Hence a sample with size 43 is enough.

Example 4.10

In example 4.4, the mean drying time of the new interior wall paint was estimated
to be from 61.56 mins to 71.10 mins, based on a sample with n  12 , X  66.33 ,
S  7.51 . The width of this interval is 71.10  61.56  9.54 which may not be
informative enough. Suppose we want to have a more informative interval estimate
such that the width of the interval is as small as 8 mins. How many test areas
should we take as our sample?

Since the width of the confidence interval is two times the margin of error, the
precision requirement can be achieved with D  4 . Equating this to the margin of
error:
S
4  tn 1, 0.025
n

The value of S can be estimated by the previous experimental result, S  7.51 .

Hence we have
2
tn 1, 0.025 4 t 
  0.533 , i.e. n   n 1, 0.025  .
n 7.51  0.533 

It is hard to analytically solve this equation as it involves the inverse cdf of the t-
distribution. To find the smallest sample size to fulfil the requirement, we can use
the method of trial and error.

P.93
Stat2602 Probability and Statistics II Fall 2014-2015

Starting with an initial guess as n  15 , then t14 , 0.025  2.145 . So we need

2
 2.145 
n   16.20
 0.533 

This result suggest that we may take n  17 , for which t16 , 0.025  2.120 and we need

2
 2.120 
n   15.82
 0.533 

Try n  16 , for which t15, 0.025  2.131 and

2
 2.131 
n   15.98
 0.533 

Hence we should take a sample with size n  16 .

If we want the confidence interval to be shorter than 3 mins, then the precision
requirement would be D  1.5 . Equating the precision requirement to the margin of
error:
2
tn 1, 0.025 1.5  tn 1, 0.025 
  0.200 , i.e. n   
n 7.51  0.200 

Try n  20 , t19 , 0.025  2.093 and

2
 2.093 
n   109.52
 0.200 

We may need a much larger n to fulfil such precision requirement. Since the t-
values are very close to Z-values when the degrees of freedom n  1 is large, we
may simply use Z 0.025  1.96 to replace tn 1, 0.025 and calculate

2
 1.96 
n   96.04 .
 0.200 

Hence a sample with size n  97 will be needed.

P.94
Stat2602 Probability and Statistics II Fall 2014-2015

Example 4.11

In Example 4.6, the increase in mean level of approved home loan applications
from April to May was estimated as $4600 with a margin of error of $7648. The
result is inconclusive as the margin of error is larger than the estimated increase.
Suppose we want to have a more conclusive result next year such that the
estimation error can be reduced to as low as $4000. How many loan applications
should we need in order to achieve this requirement?

From the past data we have the pooled sample standard deviation calculated as
S pool  6667 . For simplicity, we use equal sample sizes, i.e. n1  n2  n . Equating
the precision requirement to the margin of error gives

1 1 6667  2
4000  tn n 2 S pool   t2 n 2 , 0.01  ,
n n n
2
 6667  2 
n    t2 n 2 , 0.01   2.357  t2 n2 , 0.01 
2
i.e.
 4000 

Try n  15 , t 28, 0.01  2.467 and n  2.357  2.467   33.81 .


2

Try n  34 , t66 , 0.01  2.384 and n  2.357  2.384   31.57 .


2

Try n  32 , t62 , 0.01  2.388 and n  2.357  2.388  31.68 .


2

Hence we need data on n  32 loan applications from each month.

Example 4.12

In Example 4.7, the unemployment rate is estimated to be 8.2%, with a margin of


error of 2.4%. To get a more accurate estimate of the proportion, say, with margin
of error no greater than 2%, we will need a sample with size

Z 02.025 p 1  p 
n
D2

1.96  0.082  0.918
2

(Use 8.2% as an estimate of p.)


0.022
 722.95

A larger sample of n  723 individuals from the labour force will be needed.

P.95
Stat2602 Probability and Statistics II Fall 2014-2015

Example 4.13

A food products company has hired a marketing research firm to sample two
markets, I and II, to compare the proportions of consumers who prefer company’s
frozen dinners over its competitors’ products. No prior information is available on
the magnitude of the proportions p1 and p2 . If the food products company wishes
to estimate the difference in proportions of consumers who prefer its products
correct to within 0.04 with 95% confidence, how many consumers must be
sampled in each market?

Suppose we take samples with equal sizes, i.e. n1  n2  n . Equating the precision
requirement to the margin of error, we have

p1 1  p1  p2 1  p2  p1 1  p1   p2 1  p2 
0.04  Z 0.025   1.96 
n n n

Since no information is available on the values of p1 and p2 , we may consider the


worse scenario with the largest possible value of the margin of error. By
completing square, the maximal value of p 1  p   0.25   p  0.5 occurs when
2

p  0.5 . Putting p1  p2  0.5 in above equation gives a conservative sample size


that will guarantee our precision requirement.

0.5  0.5  0.5  0.5


Solving 0.04  1.96 
n
2
 1.96 
gives n  0.5     1200.5 .
 0.04 

Therefore we need to sample n  1201 consumers from each market.

P.96

You might also like