0% found this document useful (0 votes)
1 views

Chapter 2

Chapter Two discusses statistical inference methods for estimating population parameters and testing hypotheses. It outlines two main types of statistical inference: estimation (point and interval) and hypothesis testing, detailing the properties of good estimators. The chapter also explains the concept of sampling distributions and the construction of confidence intervals for estimating population means.

Uploaded by

Tigist G
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Chapter 2

Chapter Two discusses statistical inference methods for estimating population parameters and testing hypotheses. It outlines two main types of statistical inference: estimation (point and interval) and hypothesis testing, detailing the properties of good estimators. The chapter also explains the concept of sampling distributions and the construction of confidence intervals for estimating population means.

Uploaded by

Tigist G
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

CHAPTER TWO

1. Inference about a population mean and proportion

2.1. Introduction
Inference, specifically decision making and prediction, is centuries old and plays a
very important role in our lives. Each of us faces daily personal decisions and
situations that require predictions concerning the future. The inferences that
individuals make should be based on relevant facts, which we call observations, or
data.
Methods for making inferences about parameters fall into one of two categories.
Either we will estimate (predict) the value of the population parameter of interest
or we will test a hypothesis about the value of the parameter. These two methods
of statistical inference estimation and hypothesis testing involve different
procedures, and, more important, they answer two different questions about the
parameter. In estimating a population parameter, we are answering the question,
‘‘what is the value of the population parameter?’’ In testing a hypothesis, we are
answering the question, ‘‘is the parameter value equal to this specific value?’’
Inference is the process of making interpretations or conclusions from sample
data for the totality of the population. Inferential statistics uses the sample
results to make decisions and draw conclusions about the population from which
the sample is drawn. In statistics there are two ways through which inference can
be made.

 Statistical estimation  Statistical hypothesis testing

Parameter and Statistic


 A number that describes a population is called a parameter
 A number that describes a sample is a statistic
 If we take a sample and calculate a statistic, we often use that statistic to
infer something about the population from which the sample was drawn

Getu D.
The two common forms of statistical inference are. 1. Estimation 2. Null
hypothesis tests of significance (NHTS)
There are two forms of estimation:
 Point estimation (maximally likely value for parameter)
 Interval estimation (also called confidence interval for parameter)
Both estimation and NHTS are used to infer parameters. A parameter is a
statistical constant that describes a feature about a phenomena, population, pmf,
or pdf
2.2 Statistical Estimation:
This is one way of making inference about the population parameter where the
investigator does not have any prior notion about values or characteristics of the
population parameter. There are two ways estimation:
i. Point Estimation: The goal of point estimation is to make a reasonable guess
of the unknown value of a designated population quantity, e.g., the populations
mean. The quality of an individual estimate depends on the individual sample
from which it was computed and is therefore affected by chance variation.
Point Estimation is a single value or number of sample information that is used
μ
to estimate a parameter. The best point estimate of the population mean is
X̄ .
the sample mean
ii. Interval estimation: It is the procedure that results in the interval of values as
an estimate for a parameter, which is interval that contains the likely values of

Getu D.
a parameter. It deals with identifying the upper and lower limits of a
parameter.

Estimator and Estimate


Estimator is the rule or random variable that helps us to approximate a
population parameter. But estimate is the different possible values which an
n
∑ Xi
X̄ = i=1
estimator can assume. For example: The sample mean n is an estimator for

the population mean and X̄ =10 is an estimate, which is one of the possible values

of X̄ .
Properties of best estimator
Three Properties of a Good Estimator
 It should be unbiased.
 It should be consistent.
 It should be relatively efficient.
 The estimator should be an unbiased estimator. That is, the expected value or
the mean of the estimates obtained from samples of a given size is equal to
the parameter being estimated. It’s desirable that the sampling distribution be
centered on the true population parameter. An estimator with this property is
called unbiased.

Variance is Unbiased Estimator of population variance.


Solution

= = it is UE

Getu D.
Now, we want to compute the expected value of this

Now, let's multiply both sides of the equation by n-1, just so we don't have
to keep carrying that around, and square out the right side, just like we did
with that shortcut formula for SSX, above.

Let's write that again as a numbered equation:

Unfortunately, the expected value of the square of something is not equal


to the square of the expected value, so we seem to have hit an impasse
with both terms on the RHS. But, we're not out of tricks yet. Each of those

Getu D.
terms is an expected value of something squared: a second moment. Let's
use the trick about moments that we saw above. First, let Y be the random
variable defined by the sample mean. We're trying to figure out the
expected value of its square.

We can substitute this stuff for the second term on the RHS of equation 1. Also,
note that the first term on the RHS of equation 1 is the second moment of X, so
that can also be rewritten. Doing both substitutions gives us:

 The estimator should be consistent. For a consistent estimator, as sample size


increases, the value of the estimator approaches the value of the parameter
estimated.
 The estimator should be a relatively efficient estimator. That is, of all the
statistics that can be used to estimate a parameter, the relatively efficient
estimator has the smallest variance. It's desirable that our chosen estimator

Getu D.
have a small standard error in comparison with other estimators we might
have chosen.
2.2.1 Sampling Distribution of the sample mean
Because statistic such as x varies from sample to sample, they are random
variables. As such, Statistic has probability distributions associated with them. In
order to make probability statements regarding a sample statistic, we need to
know the probability distribution of the sample statistic. That is to say, we need to
know the shape, center and spread of the sample statistic’s distribution.
The sampling distribution of a statistic is a probability distribution for all
possible values of the statistic computed from a sample of size n.

 There are commonly three properties of interest of a given sampling


distribution.
 Its Mean
 Its Variance
 Its Functional form.

Sampling distribution of the sample mean is a theoretical probability distribution


that shows the functional relationship between the possible values of a given
sample mean based on samples of size and the probability associated with each
value, for all possible samples of size drawn from that particular population.
Steps for the construction of Sampling Distribution of the mean

1. From a finite population of size N, randomly draw all possible samples of


size n
2. Calculate the mean for each sample.
3. Summarize the mean obtained in step 2 in terms of frequency distribution
or relative frequency distribution.

Getu D.
Suppose we have a population of size 5=N, consisting of the age of five

Example:
children: 1, 3, 5, 7 and 9

⇒ Population mean=μ=
∑ X i =1+3+5+7+9 =25 =5
N 5 5
2 ∑
2
( X i−μ ) (1−5)2 +(3−5 )2 +(5−5 )2 +(7−5 )2 +(9−5 )2 40
Population variance=σ = = = =8
N 5 5

The standard deviation is σ = 2.828427. In most of the situations we never know

all population values µ and σ, but we estimate sample values.

Example: Take samples of size 2 without replacement and construct sampling

distribution of the sample mean.

( Nn ) =(52 )=10
There are possible samples of size as shown below.

Sample No Sample Mean ( )
1 1, 3 2
2 1, 5 3
3 1, 7 4
4 1, 9 5
5 3, 5 4
6 3, 7 5
7 3, 9 6
8 5, 7 6
9 5, 9 7
10 7, 9 8
1+ 3 1+ 5
x
For instance, 1 = 2 =2,
x 2 = 2 =3, etc

Getu D.
Sampling is random so that each sample has the same probability
1/ ( Nn ) 1
=10 of
being selected. x
f Probability
2 1 1/10
3 1 1/10
4 2 2/10
5 2 2/10
6 2 2/10
7 1 1/10
8 1 1/10
Total 10 1.0

This is the sampling distribution of x.


Remark:

1. In general if sampling is with replacement

2 σ2
σ x=
n
2. The sample mean is unbiased estimator of the population mean i.e.
μ x=μ ⇒ E ( x ) =μ

 Sampling may be from a normally distributed population or from a non-


normally distributed population.
x
 When sampling is from a normally distributed population, the distribution of
will possess the following property.

1. The distribution of x will be normal

2. The mean of x μ =μ
is equal to the population mean, i.e. x

3. The variance of x is equal to the population variance divided by the sample


size i.e.
2 σ2
σ x=
n

Getu D.
⇒x ~ N μ,(σ
√n )
⇒ Z=
x−μ
σ
√n
2.2.2 Point and Interval estimation of the population mean
i. Point estimation of the population mean
A point estimator is the numeric value of a sample statistic that is used to
estimate the value of a population parameter. The best point estimator of the

population mean µ is the sample mean X .


ii. Interval estimation (confidence interval) of the population mean

Although X possesses nearly all the qualities of a good estimator, different


samples are very likely to result in different sample means, and thus there is
some degree of uncertainty involved. Because the point estimate is unlikely to be
exactly correct, we usually specify a range of values in which the population
parameter is likely to be. Besides, a point estimate does not provide any
information about the variability of the estimator.
Definition: An interval estimator (or confidence interval) is a formula that tells us
how to use sample data to calculate an interval that estimates a population
parameter.
For example, if our confidence level is 95%, then in the long run, 95% of our
sample confidence intervals will contain 𝛍.
Consequently, interval estimation is often preferred. This technique provides a
range of reasonable values that are intended to contain the parameter of interest
with a certain degree of confidence. This range of values is called a confidence
interval.
The confidence level of an interval estimate of a parameter is the probability
that the interval estimate will contain the parameter, assuming that a large
number of samples are selected and that the estimation process on the same
parameter is repeated.
A confidence interval is a specific interval estimate of a parameter determined
by using data obtained from a sample and by using the specific confidence level
of the estimate.

Getu D.
The probability that an interval estimate will contain the parameter is called
confidence level. There are different cases to be considered to construct
confidence intervals.
Intervals constructed in this way are called confidence intervals
Suppose that a sample of size n is selected from a population that has mean 𝛍
and standard deviation σ. Let X 1; X2; ; Xn be the n observations that are
independent and identically distributed (i.i.d.). Define now the sample mean and
the total of these n observations as follows:

and T =

The central limit theorem states that the sample mean follows approximately
the normal distribution with mean 𝛍 and standard deviation σ, where 𝛍 and σ are
the mean and standard deviation of the population from where the sample was
selected. The sample size n has to be large (usually n ≥ 30) if the population from
where the sample is taken is non-normal.
If the population follows the normal distribution then the sample size n can be
either small or large

Case 1: When n is large or if the population is normally distributed


• If the variable x of a population is normally distributed with mean 𝛍 and
standard deviation σ then, for any sample of size 1 n , the variable x is also
σ2
normally Distributed with mean 𝛍 and standard deviation n
σ2
In this case X is normally distributed with mean μ and variance n . That is
σ2
X̄ ~ N ( μ , )
n This allows us to use the normal distribution curve for computing
confidence intervals.

10

Getu D.
X̄ − μ
⇒ Z= has a normal distribution with mean 0 and s tan dard deviation 1.
σ
√n
σ σ
⇒ μ= X̄ ±Z = X̄±Ε ⇒ Ε=Z
√n √n
For the interval estimator to be a good estimator the error should be small. How ε
can be small?
σ
o If is small
o By increasing the sample size (n)
o By decreasing Z
The best way is to decrease Z. to decrease Z we have to attach standard normal
distribution with the theory of chance.

/2 (1 ) /2

z/2
z
z/2 0

Figure: 2.1. A (1-α ) Confidence Interval


⇒ P (−Z α <Z <Z α )=1−α
2 2
X̄−μ
⇒ P (−Z α <Z <Z α )=P(−Z α < <Z α )=1−α
2 2 2
σ 2
√n
σ σ
⇒ P (− X̄−Z α <−μ<− X̄ +Z α )=1−α
2√ 2 √
n n
σ σ
⇒ P ( X̄−Z α <μ< X̄ +Z α )=1−α
2 √n 2 √n
σ σ
⇒ A (1−α)100 % confidence int erval for μ will be ( X̄−Z α , X̄ +Z α )
2 √n 2 √n
However most of the time σ is not known , in that case we estimate σ by its po int estimate S
S S
⇒( X̄ −Z α , X̄ +Z α ) is a (1−α )100 % confidence int erval for μ
2 √ 2√
n n

11

Getu D.
The Z values corresponding to the most commonly used confidence levels is given
below
(1-
α)100% α α/2 Zα/2
90 0.1 0.05 1.645
95 0.05 0.025 1.96
99 0.01 0.005 2.58
For example for 95% confidence interval Zα/2=1.96
Statistical interpretation of a confidence interval: Suppose we repeated this
sampling experiment 100 times; that is, we collected 100 different sets of data,
each set consisting of 40 observations. Suppose that we computed a confidence
interval based on each of the100 data sets. On average, we would expect 90 of
the confidence intervals to include the true mean µ; we would expect that 10
would not. The figure 90 comes from the fact that we chose a 90% confidence
interval.
More generally, we can choose whatever confidence level we want. The
convention is to specify the confidence level as 1−α, where α is typically 0.1, 0.05
or 0.01. These three α values correspond to confidence levels 90%, 95% and 99%.
(α is the Greek letter alpha.)
Definition: For any α between 0 and 1, we define z α to be the point on the z-axis
such that the area to the right of zα under the standard normal curve is α; i.e.
(Z>zα)=α.

12

Getu D.
Figure 1: The area to the right of z α is α. For example, z.05 is 1.645. The area
outside ± zα /2 is α/2+α/2=α. For example, z.025=1.96 so the area to the right of
1.96 is 0.025, the area to the left of−1.96 is also 0.025, and the area
outside±1.96 is .05.

 Why zα/2, rather than zα? We want to make sure that the total area outside the
interval is α. This means that α/2 should be to the left of the interval and α/2
should be to the right. In the special case of a 90% confidence interval,
α=0.1, so α/2=0.05, andz.05 is indeed 1.645.

The expression

is called the half width of the confidence interval or the margin of error. The half
width is a measure of precision; the tighter the interval, the more precise our
estimate. Not surprisingly, the half width

 decreases as the sample size increases;


 increases as the population standard deviation increases;
 increases as the confidence level increases (higher confidence requires larger
zα/2

2
Case 2: When n is small and the population variance σ is not known
When σ is known and the variable is normally distributed, or when σ is unknown

and n¿ 30 , the standard normal distribution is used to find confidence intervals.


However, in many situations, the population standard deviation is not known and
the sample size is less than 30. In such situations, the standard deviations from
the sample can be used in place of the population standard deviation for
confidence intervals. But a somewhat different distribution, called the t-
distribution must be used when the sample size less than 30 and the variable is
normally distributed or approximately normally distributed. The t-distribution is
sometimes called the Student’s t distribution.

13

Getu D.
Characteristics of the t-distribution
o The t-distribution is bell-shaped
o The t-distribution is symmetrical about the mean
o The mean, median, and mode are equal to 0 and are located at the center
of the distribution
o The curve never touches the x-axis
The t-distribution differs from the standard normal distribution in the following
ways:
o The variance is greater than 1
o The t- distribution is actually a family of curves based on the concept of
degrees of freedom, which is related to sample size.
o As the sample size increases, the t-distribution approaches the standard
normal distribution.
Many statistical distributions use the concept of degrees of freedom, and the
formula for finding the degrees of freedom vary from different statistical tests.
The degrees of freedom are the number of values that are free to vary after a
sample statistics has been computed.
X̄−μ
t=
2
If the sample size is small and the population variance σ is not known S/ √ n has
a t-distribution with n-1 degrees of freedom.

/2 (1 ) /2

t/2 0 t/2
S S
( X̄ −t α , X̄ + t α )
⇒ A (1-α) 100% confidence interval for µ is given by 2 √n 2 √ n

14

Getu D.
For any sample size n and any confidence level 1−α,we have tn−1,α/2 > zα/2

Consequently,intervals based on the t distribution are always wider then those

based on the standard normal.

As the sample size increases, the df increases. As the df increases, that

distribution becomes the normal distribution.


For example, we know that z 0.05 =1.645. Look down the.05 column of the t
table. t n,.05approaches 1.645 as n increases
When to use t, when to use z? Strictly speaking, the conditions are as follows:

 Z-based confidence intervals are valid if we have a large sample;


 t - based confidence intervals are valid if we have a sample from a normal
distribution with an unknown variance.

Examples: 1) The registrar of Dambi Dollo University is interested to estimate the


average age of students who graduate with BSc degree. From past studies the
population standard deviation is known to be 2 years. A sample of 50 graduating
students is selected, and the mean is found to be 23.2 years. Find the 95%
confidence interval estimate of the population mean age of the graduating
students at the university.

1) A random sample size 36 selected from a normal population has a mean of 32.
Given that the population standard deviation (σ) is 4.2. Find

a) A 95% confidence interval for the population mean


b) A 99% confidence interval for the population mean
c) Which interval is wider? Explain why

X
2) The mean operating life time for a random sample of n =10 light bulbs is
=4,000 hr, with the sample standard deviation S=200 hr. The operating life of
bulbs in general is assumed to be approximately normally distributed. Find the
95% confidence interval for the true mean operating life time.
Solutions:
X̄ =23.2 years
1. Given: σ=2 years, , n=50 (Case 1)

15

Getu D.
σ σ
⇒ A (1−α )100 %confidence int erval for μ is ( X̄−Z α , X̄ +Z α )
2
√n 2
√n
σ σ
⇒ X̄ −Z α <μ< X̄ +Z α
2
√n 2
√n
α
1−α=0.95 ⇒α=0.05 , =0.025 ⇒Z α =1. 96
2
2

⇒23 .2−1.96 ( √50 )


2
<μ<23.2+1.96 (√50 )
2

⇒23 .2−0. 55<μ<23.2+0.55


⇒22. 65<μ<23. 75
Interpretation: The registerar is 95 % confident taht the averafe age of graguating students is
between 22. 65 and 23.75 years

2 . Given: X̄ =32, σ =4 .2 n=36 and the population is normal


σ σ
⇒ A (1−α)100% confidence int erval for μ is ( X̄−Z α , X̄ +Z α )
2 √n 2 √n
α
a ) 95 %⇒ 1−α=0 . 95 ⇒ α=0. 05 , =0 . 025 ⇒ Z α =1. 96
2 2
σ σ
⇒ X̄−Z α <μ < X̄ +Z α
2√ 2√
n n

⇒ 32−1. 96 ( )
4 .2
√ 36
<μ<32+1 . 96
4 .2
√ 36 ( )
⇒ 32±1. 372
⇒ 30. 628<μ<33. 372
⇒The 95 % confidence int erval is ( 30 . 35 , 33 . 65 )
Interpretation : We are 95 % confident taht the population mean is between 30. 35 and 33. 65

16

Getu D.
α
b )99 %⇒1−α=0. 99⇒ α=0 .01, =0 .005 ⇒ Z α =2.58
2 2
σ σ
⇒ X̄ −Z α < μ< X̄ +Z α
2 √n 2 √n
⇒ 32−2 .58
( 4√36. 2 )<μ <32+2. 58( 4√36. 2 )
⇒ 32±1. 806
⇒ 30. 194<μ <33 .806
⇒The 99 % confidence int erval is ( 29 . 83 , 34 .17 )
Interpretation : We are 95 % confident taht the population mean is between 29. 83 and 34 . 17
c )The 99 % confidence int erval is wider than the 95 % confidence int erval
⇒ As the confidence increases the int erval becomes l arg e
3. Given n=10 X̄=4,000 hrs and S=200hrs
n is small and σ unkown (Case 2)⇒Use the t−distribution
S S
⇒ A (1−α )100% confidence int erval for μ is ( X̄ −t α , X̄ +t α )
,(n−1) √ n , ( n−1) √ n
2 2
α
95% ⇒ =0.025⇒t α =t 0 .025 ,9=2.262
2 2
,( n−1 )

⇒ 4000−2.262
200
( )
√ 10
<4000+2.262
200
√ 10( )
⇒3856.8<μ<4143.2
⇒(3856.8, 4143.2)
Interpretation: The registerar is 95 % confident taht the averafe age of graguating students is
between 22.65and 23.75 years
Exercises:
1. A sociologist found that in a sample of 49 retired men, the average number of
jobs they had during their life-time was 7.2. From previous studies it was
found that the population standard deviation of the number of jobs is 2.1.
a) Find the 90% confidence interval of the mean for the number of jobs a
man had during his life time
b) Find the 95% confidence interval of the mean for the number of jobs a
man had during his life time
c) Compare the intervals in (a) and (b)

2. An electrical firm manufactures light bulbs that have a length of life that is
approximately normally distributed with a standard deviation of 40 hours. If a
random sample of 30 bulbs has an average life of 780 hours, find a 99%
confidence interval for the population mean of all bulbs produced by this firm.

17

Getu D.
3. A random sample of 400 households was drawn from a town and a survey
generated data on weekly earning. The mean in the sample was Birr 250 with
a standard deviation Birr 80. Construct a 95% confidence interval for the
population mean earning.
4. A sample of 15 private-duty nurses showed an average weekly wage of birr
480.75 with standard deviation of birr 56. Find the 99% confidence interval for
the true mean.
5. A major truck has kept extensive records on various transactions with its
customers. If a random sample of 16 of these records shows average sales of
290 liters of diesel fuel with a standard deviation of 12 liters, construct a 95%
confidence interval for the mean of the population sampled.

2.2.3 Sampling Distribution of sample Proportion


A proportion refers to the fraction of the total that possesses a certain attribute.
For example, suppose we have a sample of four pets - a bird, a fish, a dog, and a
cat. We might ask what proportion has four legs. Only two pets (the dog and the
cat) have four legs.
^ respectively, are
The population and sample proportion, denoted by P and p
calculated as
Number of elements in the population with a specific characteristics
P= Total number of elements in the population
Number of elements in the sample with a specific characteristics
^=
p Total number of elements in the sample
The concept of proportion is the same as the concept of relative frequency
distribution. The relative frequency distribution of a category or class gives the
proportion of the sample or the population that belongs to that category or class.
Example: Suppose a sample of 240 families is taken from the city and 158 of
a 158
them are homeowners. Then, the sample proportion is given by ^p = n =240 =0.66,
where ‘a ’is the number of families who own houses out of the total sample.
Just like the sample mean, the sample proportion is also a random variable. The
^ calculated for a particular sample depends on what elements of the
value of p
population are included in that sample.

18

Getu D.
The probability distribution of the sample proportion ^p is called sampling
distribution. It lists the various values that p can assume and their probabilities.

To illustrate sampling distribution of ^p let us consider the following small


example. Five employees of a given firm provided information concerning their
awareness of HIV/AIDS.
Name Awareness of HIV/AIDS
A Yes
J No
S No
L Yes
T Yes
Considering this as population, its proportion P of employees who know about
HIV/AIDS is
P=3/5=0.6 or 60%
Suppose we take all possible samples of three employees each and compute the
proportion of employees, for each sample who know about HIV/AIDS. The number

( 5
)
of possible samples is 3 =10.

The following table shows all possible value of ^p (rounded to two decimal places)
for each sample.
Sample No Sample Proportion who know HIV/AIDS
1 A, J, S 1/3=0.33
2 A, J, L 2/3=0.67
3 A, J, T 2/3=0.67
4 A, S, L 2/3=0.67
5 A, S, T 2/3=0.67
6 A, L, T 3/3=1.00
7 J, S, L 1/3=0.33
8 J, S, T 1/3=0.33
9 J, L, T 2/3=0.67
10 S, L, T 2/3=0.67
^ can be prepared from the above
The frequency and sampling distribution of p
table and it is summarized as follows.
^p f probability, P( ^p )
0.33 3 3/10=0.3
0.67 6 6/10=0.6
1.00 1 1/10=0.1
total 10 1.0

19

Getu D.
E( ^p ) = ∑ p^ P( ^p )= 0.33¿ 0.3+0.67¿ 0.6+1¿ 0.1=0.601
⇒ E( ^
p )=0.60 = P, which is population proportion.

^ x
P=
The sample proportion is n is a point estimate of P can be approximated by

using a normal with a mean


μ P^ =P
and a standard error
σ P^ =
√ P ( 1−P )
n if
nP and n ( 1−P ) is greater than 5.
2.2.4 Point and Interval estimation of population proportions (P)
Point estimation of population proportions

^ X
P=
If P represents for the population proportion then the sample proportion n
^ is the point
provides a good estimate of P. Therefore, the sample proportion P
estimation of the population proportion.
Interval estimation of population proportions (P)
In the binomial experiment each trial results in one of two outcomes, which we
labeled as either a success or a failure. We designated P as the probability of a
success and 1−P as the probability of a failure. Then the probability distribution

for x, the number of successes in n identical trials, is


( x ! ( n−x
P ( x )=
n!
) !)
x
P (1−P )
n−x

In a random sample of n from a population in which the proportion of elements


classified as successes is P , the best estimate of the parameter P is the sample
proportion of successes. Letting x denote the number of successes in the n

^ x.
P=
sample trials, the sample proportion is n X can be approximated by using a

normal curve whennP≥5 and n (1−P )≥5 .

^ x
P=
In a similar way, the distribution of n can be approximated by a normal

distribution with a mean and a standard error given as


μ P^ =P
and
σ P^ =
√ P ( 1−P )
n
respectively.

20

Getu D.
( 1−α ) 100% confidence interval for the proportion of successes is
A general 100

given by
( ^p−Z α
2
√ p^ q^ ^
n
, p+ Z α
2
√ ^p q^
n

Examples
a. If in a random sample of n=230 voters, 54 voted for candidate A. find the
90% confidence interval for the proportion of individuals who voted for
candidate A.
b. In a sample of 100 teenage girls, 30% used hair coloring. Find the 95%
confidence interval of the true proportion of teenage girls who use hair
coloring.
Solutions:
a ) Let x be the number of individuals who voted for candidate A
x 54
⇒ p^ = = =0 . 235⇒ q^ =1− ^p =1−0 . 235=0 . 765 90 % ⇒ Z α =1. 645
n 230 2

confidence int erval:( p^ −Z α


2
√ p^ q^
n
, p^ +Z α
2
n√
^p q^

⇒ 0.235−1.645
230 √
0.235×0.765

⇒ 0.235−0.046 , 0.235+0.046
, 0.235−1.645
0.235×0.765
23 √
⇒(0.189 ,0.281)⇒ 0.189< p<0.281
⇒18 .9%<p<28.1%
We can be 90 % confident that the true population proportion is betwen 18.9% and 28.1%

21

Getu D.
b) Given ^p=0.3⇒ q^ =0.7 95 % ⇒ Z α =1.96
2

confidence int erval:( p^ −Z α


2
√ p^ q^
n
, p^ +Z α
2
n√
^p q^

⇒ 0.3−1.96

0.3×0.7
100
, 0.3−1.96
⇒ 0.3−0.0898, 0.3+0.0898
0.3×0.7
100 √
⇒(0.1202,0.3898)⇒ 0.1202< p<0.3898
⇒ 21.02 %< p<38.98 %
We can be 95 % confident that the true population proportion is betwen 21 .02% and 38 .98 %
Generally how do you interpret a confidence interval?
How do you interpret a confidence interval?
 Suppose you calculate a 95% confidence interval for some unknown
parameter µ (the true price all students spent on books).
IT IS INCORRECT TO SAY:
 “There is a 95% probability that µ (the average price all UNL students spent
on books) is within this interval”
Why is it Incorrect?
The confidence interval you compute is NOT a random interval and µ is a
constant (unfortunately unknown to us), thus there is no randomness. In fact, µ
either falls in that interval or it does not.
What is the Correct Interpretation?
 “We are 95% confident that if µ (the average price all UNL students spent
on books) were known, this interval would cover/contain it”
Note: The probability refers to the interval containing µ, not on µ being in the
interval
Why is this?
A 95% confidence interval is not so much a statement about any particular
interval, such as (79.3, 80.7), but pertains to what would happen if a very large
number of like intervals were to be constructed. That is, from a practical point of
view, the 95% gives the fraction of the time, in repeated sampling, that the
intervals constructed will contain the target parameter µ.
Exercise:

22

Getu D.
1. A survey of 1000 people who watched the Democrats/Republican debate
resulted in 600 who thought that democrats won the debate. Construct a 95%
percent confidence interval for the proportion of people who thought
democrats won the debate.
2. A survey of 120 female freshmen shows that 18% did not wish to work after
marriage. Find the 95% confidence interval of the true proportion of females
who do not wish to work after marriage.
2.3 Hypothesis testing

The idea of hypothesis testing is:

 Ask a question with two possible answers


 Design a test, or calculation of data
 Base the decision (answer) on the test

Hypothesis testing is one way of making inference about the population


parameter where the investigator has prior notion about the values of the
parameter. It is a common method of drawing inferences about a population
based on statistical evidence from a sample.
Hypothesis testing: A procedure, based on sample evidence and probability
theory, used to determine whether the hypothesis is a reasonable statement and
should not be rejected, or is unreasonable and should be rejected.
A hypothesis is a statement or a claim about the values of the parameter whose
plausibility is to be evaluated on the basis of the sample data.
Hypothesis: A statement about the value of a population parameter developed for
the purpose of testing.
A statistical hypothesis test is a method of making statistical decisions using
experimental data.
2.3.1 Important Concepts in Hypothesis testing
Statistical hypothesis: Is an assertion, statement, or claim about the population
whose plausibility is to be evaluated on the basis of the sample data.
Test statistic: Is a statistics whose value serves to determine whether to reject
or not reject the hypothesis to be tested. There are two types of statistical

23

Getu D.
hypotheses for each situation: the null hypothesis and the alternative hypothesis.

a. Null hypothesis: Is a claim or statement about a population parameter that is


usually assumed to be true from the very beginning until it is declared false. It
is a statistical hypothesis that states a hypothesis of equality or the hypothesis
of no difference between a parameter and a specific value. It is usually denoted
by H0.

b. Alternative hypothesis: Is a claim or statement about a population


parameter that will be true if the null hypothesis is false. It is a statistical
hypothesis that states a hypothesis of difference between a parameter and a
specific value. It is usually denoted by H 1 or HA.

Types and size of errors: There are two types of error in hypothesis testing
Type I error: Rejecting the null hypothesis when it is true. The significance level (
α ) can be interpreted as the probability of rejecting the null hypothesis when it is

actually true. The probability of type I error is denoted by α. That is, P (Type I
error) = α called level of significance.
Type II error: Failing to reject the null hypothesis when it is false (accepting the
null hypothesis when it is false). The probability of type II error is denoted by β.
That is, P (Type I error) = β
Type I error and type II error have inverse relationship and therefore, cannot be
minimized at the same time. In practice we set α at some value and design a test
that minimizes β. This is because type I error is often considered to be more
serious, and therefore more important to avoid than type II error.
The following table gives a summary of possible results of any hypothesis test:

STEPS IN THE HYPOTHESIS TESTING PROCEDURES


General steps in hypothesis testing:

24

Getu D.
State the null hypothesis and the alternate hypothesis.
Null Hypothesis – statement about the value of a population parameter.
Alternate Hypothesis – statement that is accepted if evidence proves null
hypothesis to be false.
Decide on the significance level :
In practice, the level of significance (α) is chosen arbitrarily. Three levels 0.01,
0.05, or 0.10. (Depending on confidence level). The smaller the level of
significance, the stronger the hypothesis tests. The level of significance
determines the values of the test statistic that would cause us to reject the
hypothesis. The corresponding test statistic values for the level of significance
are called the critical values. The critical value is the value that divides the non-
reject region from the reject region. A level of significance has different critical
values for one and two tailed test. Level of significance of 0.05 has critical value of
±1.96 if the test is two tailed. However if the test is one tailed the critical value
would be 1.64 to either of the tails. Note that critical values for a given level of
significance differ depending on the test statistic intended to be used.
The critical value separates the critical region from the noncritical region. The
symbol for critical value is C.V.

 The critical or rejection region is the range of values of the test value that
indicates that there is a significant difference and that the null hypothesis
should be rejected.
 The non-critical or non-rejection region is the range of values of the test
value that indicates that the difference was probably due to chance and
that the null hypothesis should not be rejected

25

Getu D.
 The critical and noncritical regions and the critical value are shown in the
following Figure for one tailed

 The critical and noncritical regions and the critical value are shown in
the following Figure for two tailed

Select the appropriate test statistic and level of significance.


When testing a hypothesis of a population mean we use the z or t-statistic .the
formula When testing a hypothesis of a mean, we use the z-statistic or we use the
t-statistic according to the following conditions. If the population standard
deviation, σ, is known and either the data is normally distributed or the sample
size n > 30, we use the normal distribution (z-statistic). When the population
standard deviation, σ, is unknown and either the data is normally distributed or
the sample size is less than 30 (n < 30), we use the t-distribution (t-statistic)
State the decision rules.
The decision rules state the conditions under which the null hypothesis will be
accepted or rejected. The critical valuefor the test-statistic is determined by the
level of significance. The critical value is the value that divides the non-reject
region from the reject region.
Compute the appropriate test statistic and make the decision.
When we use the z-statistic, we use the formula

26

Getu D.
When we use the t-statistic, we use the formula

Compare the computed test statistic with critical value.If the computed value is
within the rejection region(s), we reject the null hypothesis; otherwise, we do not
reject the null hypothesis.
Interpret the decision.
Based on the decision in Step 4, we state a conclusion in the context of the
original problem.
2.3.2 Hypothesis testing about the population means (µ)

Let
μ0 be the assumed or hypothesized value of µ, then one can formulate two-
sided (1) and one-sided (2 and 3) hypothesis as follows:

1. H 0 : μ=μ 0 vs H 1 :μ≠μ0 →two−tailed (two−sided alternative hypothesis


¿
¿ 2. H 0 : μ= μ0 vs H 1 : μ>μ 0 ¿ } ¿¿one−tailed( one−sided alternative hypothesis
A one-tailed test indicates that the null hypothesis should be rejected when the
test value is in the critical region on one side of the mean. A one-tailed test is
either a right tailed test or left-tailed test, depending on the direction of the
inequality of the alternative hypothesis
In a two-tailed test, the null hypothesis should be rejected when the test value is
in either of the two critical regions
The choice of the alternative hypothesis (H 1) depends on the prior information on
µ.

Case 1: When n is large or the population is normal

27

Getu D.
X̄ −μ0
Z cal=
σ
Test Statistics: √n
After specifying α we have the following regions (critical and acceptance) on the
standard normal distribution corresponding to the above three hypothesis.
Table: Summary of Decision Rules
Do not reject H0 (Accept H0)
H1 Reject H0 if if
|Z cal|> Z α |Z cal|< Z α
μ≠μ 0 2 2
μ> μ0 Z cal > Z α Z cal < Z α
μ< μ0 Z cal <−Z α Z cal >−Z α

X̄ −μ0
Z cal=
σ
√n

Where

If the population standard deviation σ is not unknown, the sample standard

deviation is used in and the test statistic will be


X̄ −μ0
Z cal=
S
√n
The decision rule is the same as above.
Case 2: When n is small and σ unknown
X̄−μ 0
t cal =
~ t α , n−1
S 2
We use the t-test⇒ Test Statistics: √n
After specifying α we have the following regions (critical and acceptance) on the
standard normal distribution corresponding to the above three hypothesis.
Table: Summary of decision rules
Do not reject H0 (Accept H0)
H1 Reject H0 if if
|t cal|¿ t α |t cal|¿ t α
μ≠μ 0 2
, n−1
2
, n−1

μ> μ0 t cal ¿ t α t cal ¿ t α

28

Getu D.
μ< μ0 t cal ¿−¿t α ¿ t cal ¿−¿t α ¿
X̄−μ 0
t cal =
S
√n

Where

For the t distribution to apply strictly we need the following two


assumptions:

 The observations are selected at random from the population


 The population distribution is normal

Sometimes the second assumptions may not be met as the t test is robust for
departures from the normal distribution. That means even when assumption 2 is
not satisfied, the probabilities calculated from the t table are still approximately
correct.
Examples:

1. Convicted murderers receive a sentence of an average of 18.7 years in prison.


A criminologist wants to perform a hypothesis test to determine whether the
mean sentence by one particular judge differs from 18.7 years. A random
sample of 36 cases from the court files from this judge is taken. It is found that
sample mean is 17.2 years. Assume that the population standard deviation is
4.2 years. Test whether the mean differs from 18.7 years use the 0.05
significance level.
2. The Dambi Dollo University uses thousands of fluorescent light bulbs each
year. The brand of bulb it currently uses has a mean life of 900 hours. A
manufacturer claims that its new brand of bulbs, which cost the same as the
brand the university currently uses, has a mean life of more than 900 hours.
The university has decided to purchase the new brand if, when tested, the test
evidence supports the manufacturer’s claim at the 0.05 significance level.
X
Suppose 64 bulbs were tested with the following results: = 920 hours S = 80

29

Getu D.
hours. Will the Dambi Dollo University purchase the new brand of fluorescent
bulbs?"
3. For healthy women aged 18-24, the systolic blood pressure reading with a
mean 114.8. A random sample of 16 women has an average systolic blood
pressure is 117.23 with a standard deviation of 5.63. Test the claim that the
systolic blood is different from 114.8. Use the 0.05 significance level
4. A job placement director claims that the average monthly starting salary for
nurses is less than 1600 birr. A sample of 16 nurses has a mean monthly
starting salary of 1570 birr with a sample standard deviation of 120 birr. At
α=0.05 test the claim that nurses earn less than 1600 birr a month.
5. Researchers are interested in the mean level of an enzyme in a certain
population. They take a sample of 36 individuals, determine the level of
enzyme in each and compute a sample mean 22. It is known that the variable
of interest is approximately normally distributed with a standard deviation of
10. Let’s say that they are asking the following question: Can we conclude that
the mean enzyme level in this population is different from 25?

Solution:
1 . Step 1 : State the null and alternative hypothesis
H 0 : μ=18 . 7
H 1 : μ≠18 . 7
Step 2: α=0 . 05
Step 3: σ known and n l arg e ⇒ use the Z−stastic
Step 4 : Critical regions: Re ject H 0 if |Z cal|≥Z α =1 .96
2

30

Getu D.
X̄−μ 0
Step 5 : Calculation of the test statistic : Z cal=
σ
√n
17 . 2−18 . 7
⇒ Z cal= =−2. 143
4.2
√36
Step 6 : Decission : Since |Z cal|=2. 143>1 . 96 ⇒Re ject H 0
Step 7 : Interpretation : At α=0 .05 the cri min o log ist can conclude that the average sentence is
differnt from 18. 7 years .
2 . Step 1 : State the null and alternative hypothesis
H 0 : μ=900
H 1 : μ>900
Step 2 : α=0. 05
Step 3 : σ unknown but n is larg e ⇒ use the Z−stastic
Step 4 : Critical regions: Re ject H 0 if Z cal >Z α =1. 645
X̄−μ 0
Step 5 : Calculation of the test statistic : Z cal=
S
√n
920−900
⇒ Z cal= =2
80
√ 64
Step 6 : Decission : Since Z cal =2>1 . 645⇒ Re ject H 0
Step 7 : Interpretation : At α=0 .05 there is enough evidence to indicate that the new brand of light bulbs has a
mean life time of more than 900 hours.
3. Step 1: State the null and alternative hypothesis
H 0 : μ=114.8
H 1 : μ≠114.8
Step 2: α=0.05
Step 3:n small and σ unknown ⇒use the t −test
X̄−μ 0
Step 4: Critical regions: Re ject H 0 if |t cal|¿tα =¿t0.025, (15 ) =2.131 ¿ Step 5: Calculation of the test statistic: tcal =
,n−1 S
2
√n
117.23−114.8
⇒ t cal = =1.726
5.63
√ 16
Step 6: Decission: Since |t cal|¿t α =2.131⇒Do not Re ject H 0
,n−1
2
Step 7: Interpretation: The Systolic blood pressure for a healthy women aged 18−24 is 114.8

31

Getu D.
4. Step 1: State the null and alternative hypothesis
H 0 : μ=1600
H 1 : μ<1600
Step 2: α=0.05
Step 3:n small σ unknown ⇒use the t −stastic
X̄−μ 0
Step 4: Critical regions: Re ject H 0 if t cal <−t α ,n−1 =−¿t 0.05 ,15=−1.753 ¿ Step 5: Calculation of the test statistic: tcal =
S
√n
1570−1600
⇒ t cal = =−1
120
√16
Step 6: Decission: Since Z cal=−1>−1.753⇒Do not reject H 0
Step 7: Interpretation: At α=0.05 the mean monthly starting salary of nurses is not less than 1600 birr
Exercises:

1. State the null and alternative hypotheses for each of the following

a) A researcher thinks that if expectant mothers use vitamin pills, the birth
weight of the babies will increase. The average of the birth weights of the
population is 4.6 Kilograms.
b) An engineer claims that she can decrease the mean number of defects in a
manufacturing process of compact discs by using robots instead of human
for certain tasks. The mean number of defective disks is 18
c) A psychologist feels that if he plays soft music during a test, the result of
the test will be changed. He is not whether the grades will be higher or
lower. In the past, the mean of the scores was 73.

2. The scores on an aptitude test required for entry into a certain job position is
normally distributed with mean 500 and standard deviation of 120. If a
random sample of 36 applicants has a mean of 546, is there evidence that
their mean score is different from 500? Use α=0.05.
3. Ten years ago, the mean age of juveniles held in public custody was 16.0
years. The mean age of 250 randomly selected juveniles currently being held
in public custody is 15.86 years. Assuming σ=1.01 years, does it appear that

32

Getu D.
the mean age of all juveniles being held in public custody this year is less
than it was 16 years ago? Use α=0.10.
4. The mean life time of light bulbs produced by a company is known to be
1600 hours. The mean life time of a sample of 16 light bulbs produced by the
factory is computed to be 1570 hours

a) If the population standard deviation is 120 hours, test whether or not the
mean life time is different from 1600 hours
b) If the population standard deviation is not known and the sample
standard deviation is 110 hour, is there any evidence to say that the
mean life time of the light bulbs is more than 1600 hours?

5. With a standard care, cancer patients are expected to survive a mean


duration of time equal to 38.3 months. A clinician claims that a new therapy
will improve survival time. The new therapy is administered to 100 cancer
patients. Their average time is 46.9 months. Suppose σ is known to be 43.3
months. Is this statistically significant evidence of improved survival time at
the 0.05 level of significance?
6. A recent study shows that the average age of murder victims in a small city
is 23.2 years. A random sample of 18 recent victims had a mean of 22.6
years and a standard deviation of 2 years. At α=0.05, is the average
different from 23.2 years? Assume the variable is approximately normally
distributed.
7. Commertail Bank of Ethiopia claims that the mean wait time for a teller
during peak hours is less than 4 minutes. A random sample of 20 wait times
has a mean of 2.6 minutes with a sample standard deviation of 2.1 minutes.
At α=0.05 test the bank’s claim

3.3.3 Hypothesis testing about the population proportion: P


The procedure to make tests of hypothesis about the population proportion P for
large samples is similar in many aspects to the population mean. The procedure
includes the same seven steps. Similarly, the test can be two-tailed or one tailed.

33

Getu D.
^ is approximately normally
When the sample size is large, the sample proportion P

distributed with its mean equal to P and standard deviation equal to √ P(1−P)
n
.

Hence; we use the normal distribution to perform a test of hypothesis about the
population proportion P for a large Sample. The sample size considered to be
^ ^
n(1− P)
large when n P and are both greater than 5.
Suppose the assumed or hypothesized value of P (parameter of the binomial

distribution) is denoted by
P0 then one can formulate two sided (1) and one sided

(2 and 3) hypothesis as follows:


H 0 : P=P0 H 1 : P≠P 0
1. VS
H 0 : P=P0 H 1 : P> P0
2. VS
H 0 : P=P0 H 1 : P< P0
3. VS

P
The choice of H 1 depends on the prior information we have on the values of 0 .
Decision Rule:

Hypothesis
Decision rule is to reject
Alternativ H0 if:
Null
e
P≠P 0 |Z cal|>Z α /2
VS
P=P 0 P> P0 Z cal > Z α
P< P0 Z cal <−Z α
^
( P−P 0)
Z cal= ~ N ( 0 , 1)

n √ P0 (1−P0 )

Example 8.9: A manufacturing company has submitted a claim that 100% of


items produced by a certain process are non defective. An improvement in the
process is being considered that the feel will lower the proportion of defectives
below the current 10%. In an experiment 100 items are produced with the new

34

Getu D.
process and 5 are defective: Is this evidence sufficient to conclude that the
method has been improved? Use a 0.05 level of significance.
Solution: As usual, we follow the steps:
H 0 : P=0 . 9 P≤0. 9 H 1 : P>0 . 9
1. (actually ) VS
α =0 . 05
2.
3. Critical Region: Z>1.645
4. Computation

^ X = 95 =0 . 95
P=
n 100
^
( P−P 0) 0. 95−0 . 90
Z cal= = =1. 67

√ P0 (1−P0 )
n √ 0 . 9∗0. 1
100

5. Decision: Reject H0
6. Conclusion: At 0.05 we have an evidence to say that the improvement has
reduced the proportion of defective.

Example: the unemployment rate in a given country at a given period is believed


to be 10%. The government embarked on a series of projects to reduce
unemployment. It was of interest to determine whether unemployment decreases
as a result of the projects. A random sample of 500 people was chosen, and 48 of
them were found to be unemployed. Test at 1% level of significance if the
government projects reduced the unemployment rate
Solution: As usual, we follow the steps:
H 0 : P=0 .1 H 1 : P<0 . 1
1. VS
α =0 . 05
2.
3. Critical Region: Z<-Z1.645
Z> Z α
4. Critical Region:

35

Getu D.
5. Computation

^ X = 48 =0 . 096
P=
n 500
^
( P−P 0) 0. 096−0 .1
Z cal= = =−0. 3

√ P0 (1−P0 )
n √ 0 .1∗0 . 9
500
⇒ Z tab=−Z α =Z 0 . 01=−2. 33

6. Decision: Do not reject H0 since Zcal > Ztab


7. Conclusion: the government projects didn’t reduce unemployment.

Exercise: A large sample of 200 students from the students of a certain high
school is interviewed and 85 of them are found to use city bus. Can you conclude
that at least 40% of the students use city bus? Use a 0.05 level of significance
Examples:

1. A registrar officer believes that the dropout for seniors at Dambi Dollo
University is 15%. He performed a hypothesis test to determine if the
percentage is the same or different from 15%. Last year, 38 seniors from a
random sample of 200 seniors withdrew. At α=0.05 test the educator’s claim.
2. A telephone company representative estimates that more than 25% of its
customers want call waiting service. A sample of 200 customers showed that
63 had the call waiting service. At α=0.05 is his estimate appropriate?

Solutions:
1) Step 1 : State the null and alternative hypothesis
H 0 : p =0 .15 H 1 : p ≠0 .15
Step 2 : α=0. 05
Step 4 : Critical regions : Re ject H 0 if |Z cal|≥Z α =1 . 96
2
p^ − p 0
Step 5 : Calculation of the test statistic : Z cal=

√ p0 (1− p0 )
n

36

Getu D.
38
p^ =200 =0.19, p0 =0.15⇒1−¿ p0 =0.85 ¿⇒Z cal=0.19−0.15 =1.58
200 √
0.15×0.85

Step 6:Decission:Since |Z cal|=1.58<1.96⇒ Do notreject H 0


Step 7: Interpretation: At α=0.05 ther dropout for seniors is 15%.

2) Step 1: State the null and alternative hypothesis


H 0 : p =0.25 H 1 : p >0.25
Step 2: α=0.05
Step 4: Critical regions: Re ject H 0 if Z cal >Z α =1.645
p^ − p 0
Step 5: Calculation of the test statistic : Z cal =

63
^p= =0.315, p0 =0.25⇒1−¿ p 0 =0.75 ¿⇒ Z cal=
√ n
p0 (1− p0 )

0.315−0.25
=2.12


200 0.25×0.75
200
Step 6: Decission : Since Z cal=2.12>1.645⇒Re ject H 0
Step 7: Interpretation: At α=0.05 more than 25 % have a call−waiting service .
Exercises: 1) Candidate Chala is one of the two candidates running for the
mayor of Dambi Dollo town. A random polling of 672 registered voters finds that
323 will vote for candidate Chala. At α=0.05 is it reasonable to assume that half of
the population will vote for Chala?
2) Hana believes that 50% the brides in the Dambi Dollo are younger than their
grooms. She performs a hypothesis test to determine if the percentage is the
same or different from 50%. Hana samples 100 brides and 53 reply that they are
younger than their grooms. At 1% level of significance test Hana’s claim
2.3.4 Sample size determination
In planning a statistical investigation we should decide the number of units
(Sample size) to be studied in order to answer the study objectives. If the sample
size is too small we may fail to detect important effects, or may estimate effects
too imprecisely. If the sample size is too large then we will waste resources.

37

Getu D.
Therefore it is recommended to determine the appropriate sample size for our
study.
How many samples should be included in our study? The sample size depends on
the maximum error of the estimate, the population standard deviation, and the
degree of confidence.

( )
2
Zα σ Zα σ
σ
Ε=Z α ⇒ Ε √n=Z α σ ⇒ √n=
2
⇒ n=
2

2 √n
Ε Ε
Recall that 2
Example: The college president asks the registrar officer to estimate the average
age of the students at their college. From a previous study, the standard deviation
of the ages was found to be σ= 2 years. How large the sample should be if the
officer wishes to be accurate within 1 year?
Solution: Given : Z α =2. 58 σ=2 Ε=1
2

( )(
2
Zα σ
⇒ n=
Ε
2
=
2. 58×2
1 )=26 .6256≈27
A scientist wishes to estimate the average depth of a river. He wants to be 99%
confident that the estimate is accurate within 2 feet. From a previous study, the
standard deviation of the depths measured was 4.38 feet.
Solution

Round the value 31.92 up to 32 therefore, to be 99% confident that the estimate
is within 2 feet of the true mean depth, the scientist needs at least a sample of 32
measurements. (Always round up to the next whole number.)

()
2

2
n= p^ q^
Ε

Similarly for proportions the sample size required is given by:


Example: A university administrator wishes to estimate, with 90 percent
confidence the proportion of students enrolled in M.B.A. programs that also have
undergraduate degrees in business. It was found that in random sample of 230
students enrolled in M.B.A. programs 54 have undergraduate degrees in business

38

Getu D.
What sample size should be required, if the researcher wishes to be accurate
within 5% of the true proportion?
Solution:
54
Given : 90 %⇒ Z α =1. 645 ^p= ^ . 765 and Ε=0 . 05
=0 . 235 ⇒ q=0
2
230

()
2

⇒ n= ^p q^
Ε
2
=0. 235×0 .765 ( 0 . 05)
1. 645 2
=194 . 59≈195
Exercises:
1. A college dean wishes to estimate the average number of hours his part-time
instructors teach per week. The standard deviation from pervious study is 2.6
hours. How large sample must be selected if he wants to be 99% confident of
finding whether the true mean differs from the sample mean by 1 hour?
2. A researcher wants to estimate, with 95% confidence, the number of people
who own a home computer. A previous study shows that 40% of those
interviewed had a computer at home. The researcher wishes to be accurate
within 2% of the true proportion. Find the minimum sample size necessary.

39

Getu D.

You might also like