CH Ii Business Stat
CH Ii Business Stat
CHAPTER TWO
STATISTICAL ESTIMATIONS
Introduction
The sampling distribution of the mean shows how far sample means could be from a known
population mean. Similarly, the sampling distribution of the proportion shows how far sample
proportions could be from a known population proportion. In estimation, our aim is to determine
how far an unknown population mean could be from the mean of a simple random sample
selected from that population; or how far an unknown population proportion could be from a
sample proportion. These are the concerns of statistical inference, in which a statement about an
unknown population parameter is derived from information contained in a random sample
selected from the population.
One aspect of inferential statistics is estimation, which is the process of estimating the value of a
parameter from information obtained from a sample. Since the populations from which these
values were obtained are large, these values are only estimates of the true parameters and are
derived from data collected from samples. An important question in estimation is that of sample
size. How large should the sample be in order to make an accurate estimate? This question is not
easy to answer since the size of the sample depends on several factors, such as the accuracy
desired and the probability of making a correct estimate. Inferential statistical techniques have
various assumptions that must be met before valid conclusions can be obtained. One common
assumption is that the samples must be randomly selected. The other common assumption is that
either the sample size must be greater than or equal to 30 or the population must be normally or
approximately normally distributed if the sample size is less than 30. To check this assumption,
you can use the methods such as the histogram to see if it is approximately bell-shaped, check for
outliers, and if possible, generate a normal quartile plot and see if the points fall close to a
straight line. (Note: An area of statistics called nonparametric statistics does not require the
variable to be normally distributed.) Some statistical techniques are called robust. This means
that the distribution of the variable can depart somewhat from normality, and valid conclusions
can still be obtained. The statistical procedures for estimating the population mean, proportion,
variance, and standard deviation and how large a sample size have been explained in this
chapter.
Estimation: is the process of using statistics as estimates of parameters. It is any procedure
where sample information is used to estimate/ predict the numerical value of some
population measure (called a parameter).
Statistical inference - is the process of using limited information, a sample, for the purpose
of reaching conclusion about a large set of data, the population.
A statistic-is a summary measure that is computed to describe a characteristic for only a
sample of the population.
Estimator- refers to any sample statistic that is used to estimate a population parameter.
Sample measures (i.e., statistics) are used to estimate population measures (i.e., parameters).
These statistics are called estimators. As previously stated, the sample mean is a better estimator
of the population mean than the sample median or sample mode. A good estimator should satisfy
the following properties.
Properties of a Good Estimator
A number of different estimators are possible for the same population parameter, but some
estimators are better than others. To understand how, we need to look at four important
properties of estimators: unbiasedness, efficiency, consistency, and sufficiency.
An estimator exhibits unbiasedness; this is when the expected value or the mean of the
estimates obtained from samples of given size equals to parameter: being estimated E ( x ) =.
The estimator should be a relatively efficient; that is, of all the statistics that can be used to
estimate a parameter, the relatively efficient estimator has the smallest variance. Efficiency
refers to the size of the standard error of the statistics.
The estimator should be consistent. For a consistent estimator, as sample size increases, the
value of the estimator approaches the value of the parameter estimated. For example, the
sample mean is an unbiased and a consistent estimator of population mean.
The estimator should be Sufficient: A sufficient statistic is an estimator that utilizes all the
information a sample contains about the parameter to be estimated. For example, the sample
mean is a sufficient estimator of the population mean.
Point estimate of population mean, μ
Example 2.1
Assume that we have the following random sample of n= 6 elements from a population whose
parameter is not known.
1 2 4 5 7 11
X=
∑ X =30 =5
The sample mean is n 6
The estimator is X , and 5 is the point estimate of the unknown population mean.
Point estimate of the population proportion
X is the number of elements in the sample found to belong to the category of interest and n is the
sample size.
The above array contains two even numbers 2 and 4. Calling the even numbers success, the
2 1
P= =
sample proportion of success is: 6 3
1
The statistic P is an estimator of the unknown population proportion of success and 3 is a
point estimate of the population proportion.
Point estimate of the unknown population standard deviation
We will use the symbol S to mean an estimate of the unknown population standard deviation σ.
The estimator, called sample standard deviation, is defined by the formula
S=
Where
√ ∑ ( X−X )2
n−1
X = sample mean
n= sample size
For the random sample 1, 2, 4, 5, 7, 11 write the symbol for and compute the sample standard
deviation.
Solution
S=
√ ∑ ( X−X )2
n−1
S=
√
(1−5)2+(2−5)2 +(4−5 )2 +(5−5 )2 +(7−5 )2 +(11−5 )2
6−1
Point Estimator of Standard Error of the Mean
=3.633
δ
δ x=
Standard error of the mean is computed by the formula √ n when the sample size is less
than 5 % of the population size. In our case, the total size of the population is unknown; therefore
it is safer to assume that the sample is less than 5% of the entire population. Hence, we will use
s
the estimator √ n to estimate the standard errorδ X . The symbol S X is called the sample
standard error of the mean. The formula for
S X is
S
SX =
√n Where
S X = Sample standard deviation
n= sample size
S σ .
Thus, S is the estimator for σ, and X is the estimator for X
We have calculated S= 3.633 for the random sample of 1, 2, 4, 5, 7, 11. The sample standard
error can be obtained using the formula
S 3. 633
S X= =1 . 483
√ n = √6
A Point Estimate of Sample Standard Error of the Proportion
Standard error of the proportion answers how far an unknown population proportion
S P=
√ pq
n Where p = sample proportion of success q=1− p
and
n= sample size
Example 2.2
Let an even number be success, and suppose a sample of 200 numbers be selected randomly
from a population that contains 120 even numbers. Write the symbol for and compute the value
of the point estimator of the standard error of the proportion.
S p=
√ √
P q 0.6 x0.4
n
=
200
=0 . 0346
The following table shows some population parameters and their estimators.
Population parameter sample statistic
Mean μ X
Standard deviation σ S
Variance σ2 S2
Proportion P P
Standard error of the mean
δ μ SX
B. Interval Estimate
As stated in the above, the sample mean will be, for the most part, somewhat different from the
population mean due to sampling error. Therefore, you might ask a second question: How good
is a point estimate? The answer is that there is no way of knowing how close a particular point
estimate is to the population mean. This answer places some doubt on the accuracy of point
estimates. For this reason, statisticians prefer another type of estimate, called an interval
estimate.
An interval estimate of a parameter is an interval or a range of values used to estimate the
parameter. It is a range of values that conveys the fact that estimation is an uncertain
process.This estimate may or may not contain the value of the parameter being estimated.
Interval estimate states the range within which a population parameter probably lies. Stated
differently, an interval estimate is a range of values with in which the analyst can declare with
some confidence that the population parameter will fall. The interval with in which a population
parameter is expected to lie is usually referred to as the confidence interval. The probability that
a parameter lies within the specified interval estimate of the parameter is called confidence level
Confidence interval is a specific interval estimate of a parameter determined by using data
obtained from a sample and the specific confidence level of the estimate. The confidence interval
for the population mean is the interval that has a high probability of containing the population
mean,
Compiled by GENANEW W. AMU Department of Accounting and Finance
Page 5 of 26
BUSINESS Statistics chapter TWO 2014
In an interval estimate, the parameter is specified as being between two values. For example, an
interval estimate for the above example page 3, if we state that the mean, μ , is between x±2 ,
the range of values from 3 (5-2) to 7 (5+2) is an interval estimate.
Either the interval contains the parameter or it does not. A degree of confidence (usually a
percent) can be assigned before an interval estimate is made. Three common confidence intervals
are used: the 90%, the 95%, and the 99% confidence intervals.
A 95% confidence interval means that about 95% of the similarly constructed intervals will
contain the parameter being estimated. If we use the 99% confidence interval we expect about
99% of the intervals to contain the parameter being estimated.
and at the same time σ is known, we can estimate μ (regardless of the sample size) using the
following formula1.
σ
μ= X ± Z α / 2
√n
Where:
X = sample mean
Z = value from the standard normal table reflecting confidence level
σ = population standard deviation
n = sample size
α = the proportion of incorrect statements (α = 1 – C)
= unknown population mean
From the above formula we can learn that an interval estimate is constructed by adding and
subtracting the error term to and from the point estimate. That is, the point estimate is found at
the center of the confidence interval.
To find the interval estimate of population mean, μ we have the following steps.
For a 90% confidence interval, Zα/2=1.65; for a 95% confidence interval, Zα/2 =1.96; and for
a 99% confidence interval, Zα/2 = 2.58.
The term Zα/2 (σ / n ) is called the margin of error (also called the maximum error of the
estimate). For a specific value, say, α = 0.05, 95% of the sample means will fall within this error
value on either side of the population mean, as previously explained.
σ σ
Zα/2 ( / n) Zα/2 ( / n)
X
Distribution of ’s
When n≥30, can be substituted for σ but a different distribution is used.
The margin of error also called the maximum error of the estimate is the maximum likely
difference between the point estimate of a parameter and the actual value of the parameter. A
more detailed explanation of the margin of error follows in the following examples which
illustrate the computation of confidence intervals.
Assumptions for Finding a Confidence Interval for a Mean When σ is Known
1. The sample is a random sample.
2. Either n ≥ 30 or the population is normally distributed if n <30.
Example 2.3
A researcher wishes to estimate the number of days it takes an automobile dealer to sell a
Chevrolet Aveo. A sample of 50 cars had a mean time on the dealer’s lot of 54 days. Assume the
population standard deviation to be 6.0 days. Find the best point estimate of the population mean
and the 95% confidence interval of the population mean.
Solution
The best point estimate of the mean is 54 days. For the 95% confidence interval use z =1.96.
X -Zα/2(σ / μ X σ
n)< < + Zα/2 ( / n )
54-1.96(6.0/ 50 ) < μ < 54+ 1.96 (6.0/ 50 )
54 – 1.7 < µ < 54 + 1.7
52.3 < µ < 55.7 or 54 ± 1.7
Hence one can say with 95% confidence that the interval between 52.3 and 55.7 days does
contain the population mean, based on a sample of 50 automobiles.
Example 2.4
A survey of 30 emergency room patients found that the average waiting time for treatment was
174.3 minutes. Assuming that the population standard deviation is 46.5 minutes, find the best
point estimate of the population mean and the 99% confidence of the population mean.
Solution
The best point estimate is 174.3 minutes. The 99% confidence is interval is
X -Zα/2(σ / n ) < μ X + Zα/2 (σ / n )
<
174.3-2.58 (46.5/ 50 ) < μ < 174.3+ 2.58 (46.5/ 50 )
174.3 – 21.9 < µ < 174 + 21.9
152.4 < µ < 1.96.2 or 174.3 ± 21.9
Hence, one can be 99% confident that the mean waiting time for emergency room treatment is
between 152.4 and 196.2 minutes.
Example 2.5
The vice president of operations for Ethiopian Tele Communication Corporation (ETC) is in the
process of developing a strategic management plan. He believes that the ability to estimate the
length of the average phone call on the system is important. He takes a random sample of 60
calls from the company records and finds that the mean sample length for a call is 4.26 minutes.
Past history for these types of calls has shown that the population standard deviation for call
length is about 1.1 minutes. Assuming that the population is normally distributed and he wants
to have a 95% confidence, help him in estimating the population mean.
Solution:
n= 60 calls X = 4.26 minutes σ = 1.1 minutes C= 0.95
σ 1 .1 σ
σ X= μ= X ± Z α / 2
i. √n = √60 = 0.142 iv. √n
ii. α = 1 – C = 1- 0.95 = 0.05 = 4.26 ± 1.96(0.142)
α /2 = 0.05/2 = 0.025 = 3.98 ≤µ ≤ 4.54 or 4.26 ± 0.28
Z = Z 0.025 =1.96
iii. α /2
The vice-president of ETC can be 95% confident that the average length of a call for the
population is between 3.98 and 4.54 minutes.
Example 2.6
The following data represent a sample of the assets (in millions of dollars) of 30 credit unions in
southwestern Pennsylvania. Assume the standard deviation of the population is 14.405. Find the
90% confidence interval of the mean.
12.23 16.56 4.39
2.89 1.24 2.17
13.19 9.16 1.42
73.25 1.91 14.64
11.59 6.69 1.06
-zα/2 zα/2
Finding α/2 for a 98%Confidence Interval
Finding zα/2 for a 98% confidence interval from cumulative standard normal distribution z= 2.33
is 0.9901. Once α/2 is determined, the corresponding Zα/2 value can be found by using the
following procedures. To get the Zα/2 value for a 98% confidence interval, subtract 0.01 from
1.0000 to get 0.9900. Next, locate the area that is closest to 0.9900 (in this case, 0.9901) in Table
and then find the corresponding Z value. In this example, it is 2.33. For confidence intervals,
only the positive z value is used in the formula.
When the original variable is normally distributed and σ is known, the standard normal
distribution can be used to find confidence intervals regardless of the size of the sample. When n
≥ 30,the distribution of means will be approximately normal even if the original distribution of
the variable departs from normality.
Whenσ is unknown, s can be used as an estimate of but a different distribution is used for the
critical values. This method is explained in the following section
Confidence estimate of µ, normal population, σ unknown
When σ is known and the sample size is 30 or more, or the population is normally distributed if
the sample size is less than 30, the confidence interval for the mean can be found by using the z
σ
X ±Z α / 2
distribution we search for Z value of /2 and use the formula √ n to estimate the
interval within which the population mean lies with C Confidence coefficient.However, most of
the time, the value ofσ is not known, so it must be estimated by using s, namely, the standard
deviation of the sample.
S=
√ ∑ ( X−X )2
n−1
After calculate the standard deviation, standard error must be computed using the following
formula.
S
S x=
√n
When population standard deviation known, the interval estimate can be calculated as
X−μ
Z=
σx
When s is used, especially when the sample size is small, critical values greater than the values
for are used in confidence intervals in order to keep the interval at a given level, such as the 95%.
These values are taken from the Student t distribution, most often called the t distribution, which
was identified for the first time by W. S. Gosset in 1900s.
To use this method, the samples must be simple random samples, and the population from which
the samples were taken must be normally or approximately normally distributed, or the sample
size must be 30 or more. Some important characteristics of the t distribution are described below.
Characteristics of the t Distribution
The t distribution shares some characteristics of the normal distribution and differs from it in
others. The t distribution is similar to the standard normal distribution in the following ways:
1. It is bell-shaped.
2. It is symmetric about the mean.
3. The mean, median, and mode are equal to 0 and are located at the center of the distribution.
4. The curve never touches the x axis.
The t distribution differs from the standard normal distribution in the following ways:
1. The variance is greater than 1.
2. The t distribution is actually a family of curves based on the concept of degrees of freedom,
which is related to sample size.
3. As the sample size increases, the t distribution approaches the standard normal distribution.
When population standard deviation is unknown, we need to estimate population standard
deviation with sample standard deviation and the distribution does not follow normal distribution
rather it follows a student’s t-distribution. There are different t-distributions for each sample size.
T-distribution is discussed in a greater detail in hypothesis test. Tail areas for t-distribution are
presented according to parameter called degrees of freedom. Many statistical distributions use
the concept of degrees of freedom, and the formulas for finding the degrees of freedom vary for
different statistical tests. The degrees of freedom are the number of values that are free to vary
after a sample statistic has been computed, and they tell the researcher which specific curve to
use when a distribution consists of a family of curves.
For example, if the mean of 5 values is 10, then 4 of the 5 values are free to vary. But once 4
values are selected, the fifth value must be a specific number to get a sum of 50, since 50 ÷5 =10.
Hence, the degrees of freedom are 5-1=4, and this value tells the researcher which t curve to use.
See the following curve
We shall use the symbol ν for degrees of freedom. Degree of freedom for t-distribution can be
calculated as ν=n−1 .
Where
ν = degree of freedom
n= sample size
As ν increases, the tail area decreases; so is the t-value. As degrees of freedom increases, the t-
distribution approaches the standard normal distribution. When degree of freedom is 30, the t-
distribution is approximately similar to normal distribution.
To construct interval estimate for µ under this situation, we need to use the value of
t α /2 ,ν which
will be read from statistical table in association with the formula:
S S
X −t α /2, ν ≤μ≤ X +t α /2, ν
√n √n
Where
X = Sample Mean n= sample size
ν= n -1 (degrees of freedom) S= sample standard deviation
μ=unknown population mean
Example 4.7
The environmental protection officer of a large industrial plant sought to determine the mean
daily amount of sulphur oxide (pollutant) emitted by the plant. Because measurements costs were
high, only a random sample 10 days’ measurements were obtained: these were, in tons per day,
8 7 10 15 11 5 8 5 13 12
Suppose emissions per day are normally distributed. Estimate μ, the mean amount of sulphur
oxides emitted per day using the confidence interval with a confidence coefficient of 0.95.
X=
∑X 95
=9 . 5
Solution n =10
S=
√ ∑ ( X−X )2
n−1 = √
94 . 5
9 =3.24
The confidence level is 95%. Therefore, significance level α = 1-C= 1-0.95= 0.05 and
/2=0.025.
Next, we have to calculate the degree of freedom for the observation; which is given as ν=n-1=
10-1=9
S S
X −t α /2, ν ≤μ≤ X +t α /2, ν
We can now calculate the interval as √n √ n . t α /2, ν in this specific
situation means t0.025, 9 = 2.26
Therefore Interval can be calculated as:
3 . 24 3 . 24
9 . 5−2 .26 ( )≤μ≤9. 5+2 .26 ( )
√10 √10 = 7 .2≤μ≤11. 8
Compiled by GENANEW W. AMU Department of Accounting and Finance
Page 13 of 26
BUSINESS Statistics chapter TWO 2014
Example 2.8
Ten randomly selected people were asked how long they slept at night. The mean time was 7.1
hours, and the standard deviation was 0.78 hour. Find the 95% confidence interval of the mean
time. Assume the variable is normally distributed.
Solution
Sinceσ is unknown and s must replace it, the t distribution table must be used for the confidence
interval. Hence,with 9 degrees of freedom tα/2 = 2.262. The 95% confidence interval can be
found by substituting in the formula.
S S
X −t α /2, ν ≤μ≤ X +t α /2, ν
√n √ n . t α /2, ν
0 .78 0 . 78
7 . 1−2 .262( )<μ <7 .1+2 . 262( )
√10 √ 10 = 6 . 54<μ<7 . 66
Therefore, one can be 95% confident that the population mean is between 6.54 and 7.66 inches.
Example 2.9
The data represent a sample of the number of home fires started by candles for the past several
years. (Data are from the National Fire Protection Association.) Find the 99% confidence
interval for the mean number of home fires started by candles each year.
5460 5900 6090 6310 7160 8440 9930
Solution
Step 1 Find the mean and standard deviation for the data. The mean X = 7041.4. The standard
deviation s =1610.3.
Step 2 Find tα/2 in table. Use the 99% confidence interval with d.f. =6. It is 3.707.
Step 3 Substitute in the formula and solve.
S S
X −t α /2 ≤μ≤X +t α /2
√n √n .
1610 . 3 1610. 3
7041 . 4−3. 707( )<μ <7041. 4 +3 .707 ( )
√7 √ 7 = 4785 . 2<μ<9297 . 6
One can be 99% confident that the population mean number of home fires started by candles
each year is between 4785.2 and 9297.6, based on a sample of home fires occurring over a period
of 7 years.
Students sometimes have difficulty deciding whether to use Zα/2 or tα/2 values when finding
confidence intervals for the mean. As stated previously, when σ is known, Zα/2 values can be
used no matter what the sample size is, as long as the variable is normally distributed or n ≥ 30.
When σ is unknown and n ≥ 30, then s can be used in the formula and tα/2 values can be used.
Finally, whenσ is unknown and n < 30, s is used in the formula and tα/2 values are used, as long
as the variable is approximately normally distributed.
It may be difficult, sometimes, to know if the population is normally distributed or not. Hence,
we may need to use approximation. You may remember the central limit theorem. The Central
limit theorem proves that as sample sizes increases the distribution approaches normal
distribution. In fact for n greater than or equal to 30 statisticians use normal distribution. Hence,
we can use the Central limit theorem to construct interval estimate for a mean when sample size
is
A one-sided confidence interval can be found for a mean by using
S S
μ > x −t α / 2 or μ < x +t α / 2
√n √n .
Where tα is the value found under the row labeled One tail.
Determination of Sample Size
Sample size determination is closely related to statistical estimation. One reason behind sampling
is to reduce the cost of data collection. If we conduct a census study the cost we incur to collect
data will be prohibitively high. Therefore, we have to take small sample to hold costs dawn. On
the other hand, we want to the sample to be large enough to provide good estimator of population
proportion. Consequently, the issue is how large should the sample size be? The size of the
sample depends on three factors:
How precise or narrow we want the interval estimate to be
How confident we want to be that the interval estimate is correct
How variable is the population being sampled
The higher the desired precision or level of confidence, the larger will be the sample; also for a
given precision and level of confidence, the larger the population variability is, the larger will be
the sample. Quite often you ask, how large a sample is necessary to make an accurate estimate?
The answer is not simple, since it depends on three things: the margin of error, the population
standard deviation, and the degree of confidence. For example, how close to the true mean do
you want to be (2 units, 5 units, etc.), and how confident do you wish to be (90, 95, 99%, etc.)?
For the purpose of this, it will be assumed that the population standard deviation of the variable
is known or has been estimated from a previous study.
Sample Size for Estimating a Population Mean
σ σ
X −Z α /2 ≤μ≤ X−Z α /2
The confidence interval estimate of μ, √n √n
σ
X ±Z α /2
Can be rewritten as √ n this can be expressed as X ±e .
Therefore, the formula for sample size is derived from the margin of error formula is
σ
e= Z α / 2
√n
and this formula is solved for n as follows:
e √n=Z α /2 (σ )
2
Z α / 2∗σ Z α / 2∗σ Z
2
α /2
σ2
√ n= n= n=
e e e2
Compiled by GENANEW W. AMU Department of Accounting and Finance
Page 15 of 26
BUSINESS Statistics chapter TWO 2014
, Minimum Sample Size Needed for an Interval Estimate of the Population Mean
Therefore
2
Z ∗σ Z
2
α /2
σ2
n= α / 2 n=
e e2
Where e is the margin of error. If necessary, round the answer up to obtain a whole number.
That is, if there is any fraction or decimal portion in the answer, use the next whole number
for sample size n
.10.
Example 2
A scientist wishes to estimate the average depth of a river. He wants to be 99% confident that the
estimate is accurate within 2 feet. From a previous study, the standard deviation of the depths
measured was 4.33 feet.
Solution Since α =0.01 (or 1 -0.99), Zα/2 =2.58 and E = 2. Substituting in the formula,
2 2
Z ∗σ (2. 58 )(4 . 33)
n= α / 2 n=
e 2
= 31.2
Round the value 31.2 up to 32. Therefore, to be 99% confident that the estimate is within 2 feet
of the true mean depth, the scientist needs at least a sample of 32 measurements. In most cases in
statistics, we round off. However, when determining sample size, we always round up to the next
whole number.
As can be seen from the above formula, there is direct relationship between sample size and
variation in the population. Therefore, the more the variability the larger is the sample size.
Variation of the population, however, is neither known nor its estimate obtained prior to
sampling. Hence, if there is historical evidence of the variance that can be used. But most of the
time neither the population variance nor the sample variance are known. Hence we need to
estimate it using the formula:
officials high value −officials low value
σ=
4
Example 2.11
A sample is to be taken to estimate the mean salary of plumbers to be within 500 with a
confidence coefficient of 0.99. A Plumber’s union official states that birr 40,000 and birr 26,000
would be unusual large and small salaries for plumbers in the union. What should the sample
size be?
2 2
α /2 x
Z σ
n= 2
Solution e
It is possible to use formula and find sample size but we need to firs find the σ.
Compiled by GENANEW W. AMU Department of Accounting and Finance
Page 16 of 26
BUSINESS Statistics chapter TWO 2014
In a recent survey of 150 households, 54 had central air conditioning. Find P and q , where P is
the proportion of households that have central air conditioning.
Solution
Since X =54 and n = 150,
P=x /n P=54 /150=0. 36=36 %
q=(n−x)/n q=(54−150)/150=96/150or 0.64=64 %
Compiled by GENANEW W. AMU Department of Accounting and Finance
Page 17 of 26
BUSINESS Statistics chapter TWO 2014
You can also find by using the formula q=1− p , In this case, 1 - 0.36 = 0.64.
As with means, the statistician, given the sample proportion, tries to estimate the population
proportion. Point and interval estimates for a population proportion can be made by using the
sample proportion. For a point estimate of p (the population proportion), p (the sample
proportion) is used. On the basis of the three properties of a good estimator, is unbiased,
consistent, and relatively efficient. But as with means, one is not able to decide how good the
point estimate of p is. Therefore, statisticians also use an interval estimate for a proportion, and
they can assign a probability that the interval will contain the population proportion.
p
The confidence interval for a particular p is based on the sampling distribution of . When the
sample size n is no more than 5% of the population size, the sampling distribution of is
S p=
√ pq
n
Sp σ p , we can calculate the interval estimate as:
Then using as estimator of
P−Z α /2
√ pq
n
≤p≤ p+Z α /2
pq
n √
Compiled by GENANEW W. AMU Department of Accounting and Finance
Page 18 of 26
BUSINESS Statistics chapter TWO 2014
Since α = 1 -0.90 = 0.10, Zα/2 =1.65, P=323 /1404=023 and q=1− p=1−0.23=0.77
Substitute in the formula
P−Z α /2
√ pq
n √
< p< p+ Z α /2
pq
n
0 . 23−1 . 65
1404 √
(0 . 23)(0 . 77 )
< p<0 . 23+1 .65
0 . 23−0 . 019< P <0 . 23+0 . 019
√(0 .23 )(0 . 77)
1404
P−Z α /2
√ pq
n √
< p< p+ Z α /2
pq
n
0 . 159−1 . 96
1721 √
(0 . .159 )(0 . 841)
Hence, you can say with 95% confidence that the true percentage is between 14.2% and 17.6%.
Example 2.15
A random sample of 400 members of labour force in a five state region showed that 32 were
unemployed. Construct the 95% confidence interval for the proportion unemployed in the region.
Solution
32
P=
400 = 0.08
With C of 95%, α =0 . 05 and α / 2=0 . 025
Find Z0.025 from the statistical table. To find Z value search for the probability in the main body of
the Z table and search for the corresponding Z score. In our case that will be 1.96. Therefore, the
interval estimate can be calculated as:
P−Z α /2
√ pq
n
≤p≤ p+Z α /2
√
pq
n
0 . 08−1 . 96
400
0 . 053≤ p≤0 .107
√
(0 . 08)(0 . 92)
≤ p≤0 . 08+1 . 96
400√
(0 .08 )(0 . 92)
Consequently, with 95% confidence, we state the population proportion to be between 0.053 and
0.107 that is between 5.3% and 10.7%
Sample Size for Estimating Population Proportions
To find the sample size needed to determine a confidence interval about a proportion, use the
following formula:
The confidence interval for p is
P−Z α /2
√pq
n
≤p≤ p+Z α /2
√
pq
n , Which shows that the interval extends from
P−Z α /2
pq
n √
to
P+Z α /2
√
pq
n so we can express this as:
P±Z α /2
pq
n√
The interval will be more precise or narrower the smaller the term that follows . The term is
called margin error value and is indicated by e.
e=Zα/2
√ pq
n
If we solve for n, we get the following formula:
2
2
Zα /2 Z α /2 pq
n=pq n=
e e2
P is known (e.g., from a
There are two situations to consider. First, if some approximation of
previous study), that value can be used in the formula. Second, if no approximation of is known,
you should use P=0. 5 . This value will give a sample size sufficiently large to guarantee an
accurate prediction, given the confidence interval and the error of estimate. The reason is that
when P andq are each 0.5, the product P q is at maximum, as shown here. . If the existing
information leads to the belief that the population proportion is between two values: If both
values are on the same side of 0.5, choose p as the value closer to 0.5. If 0.5 is between the two
values, use 0.5 as for p.
P q Pq P q Pq
0.1 0.9 0.09 0.6 0.4 0.24
0.2 0.8 0.16 0.7 0.3 0.21
0.3 0.7 0.21 0.8 0.2 0.16
0.4 0.6 0.24 0.9 0.1 0.09
0.5 0.5 0.25
Example 2.16
A researcher wishes to estimate, with 95% confidence, the proportion of people who own a home
computer. A previous study shows that 40% of those interviewed had a computer at home. The
researcher wishes to be accurate within 2% of the true proportion. Find the minimum sample size
necessary.
Solution
Since no prior knowledge of P is known, assign a value of 0.5 and then q =1 - 0.5= 0.5.
Substitute in the formula, using E = 0.03.
2 2
Z 1. 96
n=pq α / 2 n=(0 . 50)(0 . 50)
E 0. 03
n=1067 .1≈1068
Hence, a sample size of 1068 would be needed.
N.B. In determining the sample size, the size of the population is irrelevant. Only the degree of
confidence and the margin of error are necessary to make the determination.
Example 2.18
Compiled by GENANEW W. AMU Department of Accounting and Finance
Page 21 of 26
BUSINESS Statistics chapter TWO 2014
1 . 642 ( 0 .5 )( 0 .5 )
n= n=420 . 25
Therefore, ( 0 . 04 )2
Interval Estimation of the Difference between two independent Means
It is clear that the unbiased point estimate of the difference between the means of two
σ X − X = σ +σ =
1 2
σ 21 σ 22
√+
n1 n 2
2
X1
2
X2
√
When σ 1 and σ 2 are not known, the standard error between two sample means ( 1 2 ) is
σ x −x
estimated by the sample standard error of the difference between two sample means,
S X −X = S + S =
1 2
S21 S22
√+ 2
X1
2
X2
√
n1 n2 , and the interval estimation takes the following form:
μ1 −μ2 =X 1 −X 2 ±Z α /2 S X − X
1 2, given that the sample sizes are large.
Example 2.19
2
This formula works also for problems which involve large sample sizes ( n1∧n2≥30 ) even though the
parent population may not be normally distributed.
Compiled by GENANEW W. AMU Department of Accounting and Finance
Page 22 of 26
BUSINESS Statistics chapter TWO 2014
In a sex discrimination case, an employee alleged that a large corporation paid men more than
women for comparable work. Let population 1 represent all male employees performing certain
jobs and population 2 represent all female employees performing comparable jobs at the
corporation. Independent samples are taken of n1 =100 males and n2 =100 females; the sample
means are
x 1=Birr 20 ,600 and x 2 =Birr 19 , 700 , and the sample standard deviations are
s1 =Birr 3 , 000 and s2 =Birr 2, 500 . Construct a 95% confidence interval for μ1 −μ2 . What do
you conclude from this?
Solution: Male employees Female employees
n1 =100 males n2 =100 females C= 0.95
x 1=Birr 20 ,600 x 2 =Birr 19 , 700
s1 =Birr 3 , 000 s2 =Birr 2, 500
Steps:
i. Calculate the (sample) standard error of the difference between two means
S X −X =
1 2
√
S 21 S22
+ =
n1 n2
ii. Compute α /2
√
(3 ,000 )2 (2 , 500)2
100
+
100
= √ 142 , 500=390 . 51
written as
σ X −X =
1 2
√
σ2 σ 2
+ = σ2
n1 n2
1 1
+
n1 n2
.
√ ( )
2 μ −μ2 =X 1 −X 2 ±Z α /2 σ X −X
If the variance σ of the populations is known, 1 1 2 can be used to
σ 2
to develop the estimate of . Since
2
σ X −X =
1 2
σ2 σ 2
n1
+
√
n2 is based on the assumption that
σ 2 and σ
σ 2 =σ 2= σ 22 .
1 2 , we do not need a separate estimates of 1 In fact, we can combine the
2
data from the two samples to provide the best single estimate of σ . The process of combing the
2
results of two independent simple random samples to provide one estimate of σ is referred to as
2 s2 ( s ) is the weighted average of the
pooling. The pooled estimator of variance,σ , denoted by
2
p
s 2 and s 22
two sample variances, 1 , with the degrees of freedom associated with each sample being
2
used as the weights. The formula for the pooled estimator of σ is:
( n1 −1 ) S 21 + ( n2−1 ) S 22 ∑ ( X i1 −X 1) 2 +∑ ( X i2−X 2 )
S2P= =
n1 +n2 −2 n1 +n2 −2
Where:
S2P = pooled estimate of the variance
n1 = sample size drawn from population 1
n2 = sample size drawn from population 2
S21 = sample variance of the sample drawn from population 1
S22 = sample variance of the sample drawn from population 2
n1+n2-2 = pooled degrees of freedom
Based on the assumption that the population standard deviations are equal, the standard error of
the difference between means is estimated by the sample standard error of the difference between
√ S 2P
n1
+
S2P
n2
The confidence interval for μ1 −μ2 when the common standard deviations σ 1=σ 2 =σ are not
known is based on t-distribution, and is given by:
μ1 −μ2 =X 1 −X 2 ±t α /2 , v S X −X 2
1
Where:
SX
1
−X 2=
√ S 2P
( 1 1
+
n1 n2
ν = pooled degrees of freedom (n1 + n2 – 2)
)
Example 2.20
Two manufacturing companies produce drill tips that are used to cut holes in steel sheets. A
customer wishing to know which drill tips have the longer site purchases, independent samples
of n1 =20 drill tips from company 1 and n2 =15 drill tips from company 2. The mean lives of
the drill tips are x 1=78 minutes and x 2 =84 minutes. The population variances are unknown
2
s 2= 41 and s 2=36
but assumed to be equal. The sample variances are 1 . Construct a 95%
confidence interval for μ1 −μ2 assuming that the two populations are normally distributed.
Solution:
Company One Company Two
n1 =20 drill tips n2 =15 drill tips C = 0.95
x 1=78 minutes x 2 =84 minutes
S21 = 41 S22 = 36
i. Calculate the sample standard error of the difference between two means and the pooled
degrees of freedom
√ √
2 2
( n1 −1 ) S 1 + ( n2−1 ) S2 1 1
2 2
( )
SP SP
S X −X = + = +
1 2 n1 n2 n1 +n 2−2 n1 n2
= √
( 20−1 ) 41+ ( 15−1 ) 36 1 1
20+15−2 20
+
15 ( )
Compiled by GENANEW W. AMU Department of Accounting and Finance
Page 25 of 26
BUSINESS Statistics chapter TWO 2014
√
1 , 283 1 1
= 33
= 2.13
+
20 15( )
ν = n1 +n2 -2
= 20 + 15 -2 = 33
ii. Compute α /2 and look up α /2
t ,v
α = 1-C = 1- 0.95 = 0.05
α/2 = 0.05/2 = 0.025
t α /2 , v = t 0.025 ,33 = 2.04
iii. Construct the confidence interval
μ1 −μ2 =( X 1 − X 2 ) ±t α / 2 , v S X − X
1 2
= ( 78−84 ) ±2. 04 ( 2 . 13 )
= −6±4 . 34
-10.34 ≤ μ1 −μ2 ≤ -1.66
The 95% confidence interval is (-10.34 to –1.66). This interval contains only negative values
indicating that the drill tips made by company 1 do not last as long on average, as those made by
company 2.
p − p2 is normal with
distribution of 1 n1√
P1 q 1 p2 q2
+
n2
√
P1 q 1 p2 q2
n1
+
n2 σ S
. That is, p1 −p 2 is substituted by p 1− p 2
Solving for p1 − p2 results in:
P1 −P2 =p 1− p2 +Z
values, it becomes:
√ p1 q 1 p 2 q2
n1
+
n 2 , and since Z can assume both positive and negative
P1 −P2 =p 1− p2 ±Z
√ p 1 q1 p2 q 2
n1
+
n2
Since z represents the confidence level we write it as
P1 −P2 =p 1− p2 ±Z α / 2
Where:
√ p1 q 1 p 2 q2
n1
+
n2
Example 2.21
A TV executive is interested in determining if the proportion of people who watch a late-night
talk show is higher with the regular host or a guest host. In a random sample of 400 people, 175
watch the show when the regular host is on. In an independent random sample of 500 people,
185 watch the show a guest host is on. Calculate a 95% confidence interval for p1 − p2 . What do
you conclude?
Solution:
Regular host Guest Host
n1 = 400 p1 = 0.4375 n2 = 500 p2 = 0.37
X1 = 175
q 1 = 0.5625 X2 = 185
q 2 = 0.63
C = 0.95
i. Calculate the sample standard error of the diff. between two proportions
Sp − p =
1 2
√ p 1 q 1 p 2 q2
n1
+
n2 √=
0 . 4375∗0 .5625 0 .37∗0 . 63
400
+
500
=0 . 033
ii. Compute α /2
α = 1-C = 1- 0.95 = 0.05
α/2 = 0.05/2 = 0.025
Z =Z
iii. Look up α /2 0. 025 =1. 96
iv. Construct the confidence interval
P1 −P2 =p 1− p2 ±Z α / 2
√ p1 q 1 p 2 q2
n1
+
= 0.0675 ± 0.065
0.0025 ≤ p1 − p2 ≤ 0.1325
We state with 95% confidence that the true difference between p1 − p2 is between 0.0025 and
0.1325. Since this interval contains only positive value it is reasonable to say that the proportion
of people who watch TV when the regular host is on is greater than when the guest host is on.