Business Statistics PDF
Business Statistics PDF
CONTINUOUS PROBABILITY
DISTRIBUTIONS
OBJECTIVES
After completing this chapter, you should be able to
To define the continuous random variable
Identify the properties of a normal distribution.
Mention the parameters of the normal distribution
3 Find the area under the standard normal distribution, given various z values
4 Find probabilities for a normally distributed variable by transforming it into a
standard normal variable.
To solve applications using the standard normal distribution table
INTRODUCTION
This chapter completes our presentation of probability by introducing continuous random
variables and their distributions. In the previous Chapters, we introduced discrete probability
distributions that are employed to calculate the probability associated with discrete random
variables, we also introduced the binomial distribution, which allows us to determine the
probability that the random variable equals a particular value (the number of successes).
In this chapter, we continue our study of probability distributions by examining continuous
probability distributions. A continuous probability distribution usually results from measuring
something, such as the distance from the dormitory to the classroom, the weight of an individual,
or the amount of bonus earned by CEOs. Suppose we select five students and find the distance,
in miles, they travel to attend class as 12.2, 8.9, 6.7, 3.6, and 14.6. When examining a continuous
distribution we are usually interested in information such as the percent of students who travel
less than 10 miles or the percent who travel more than 8 miles. In other words, for a continuous
distribution we may wish to know the percent of observations that occur within a certain range.
It is important to realize that a continuous random variable has an infinite number of values
within a particular range. So you think of the probability a variable will have a value within a
specified range, rather than the probability for a specific value.
Example:
1) Find the area under the standard normal curve
a) Between z=0 and z=1.95
Solution
p(0 z 1.95) 0.4744
Class work
1. Find the area under the standard normal curve
a. Between z=0 and z=-2.05
b. Between z=0 and z=2.37
c. Between z=0 and z=-1.53
2) Find the area under the standard normal curve
a) To the right of z=1.36
b) To the left of z=-1.93
EXAMPLE:
Find the value of z so that the area under the standard normal curve
a) from 0 to z is 0.4772 and z is positive
Solution
The corresponding z value is z 0, 4772
Class work
1) Find the value of z so that the area under the standard normal curve
a. between 0 and z is (approximately) 0.4784 and z is negative
b. in the left tail is 0 .0582
c. in the right tail is 0.0268
2) Determine the value of z so that the area under the standard normal curve
a) In the right tail is .0500
b) in the left tail is .0250
c) in the left tail is .0100
d) in the right tail is .0050
3) Find the value of z so that the area under the standard normal curve
a) from 0 to z is 0.1950 and z is positive
Lecture notes on Social statistics Page 5
b) between 0 and z is (approximately) 0.2733 and z is negative
c) in the right tail is 0.1056
4) Determine the value of z so that the area under the standard normal curve
a) in the right tail is .0250
b) in the left tail is .0500
c) in the left tail is .0010
d) in the right tail is .0100
Applications of the Normal Distribution
The standard normal distribution curve can be used to solve a wide variety of practical problems.
The only requirement is that the variable be normally or approximately normally distributed.
There are several mathematical tests to determine whether a variable is normally distributed. See
the Critical Thinking Challenges on page 352. For all the problems presented in this chapter, you
can assume that the variable is normally or approximately normally distributed. To solve
problems by using the standard normal distribution, transform the original variable to a standard
normal distribution variable by using the formula
x
z
EXAMPLE:
1) Find the z value for each of the following x values for a normal distribution with 30
and 5
a) X=39 b) x=19 c) x=24 d) x=44
2) Find the following areas under a normal distribution curve with 20 and 4
a) Area between x =20 and x =27
b) Area from x =23 to x =26
c) Area between x= 9.5 and x =17
3) The weekly incomes of shift foremen in the glass industry are normally distributed with a
mean of $1, 000 and a standard deviation of $1 00.
The values of the mean and standard deviation calculated for the probability distribution of Table
2.2 give the values of the population parameters and These values are 80.60 and
8.09 . The values of and for the probability distribution of Table 2.2
SAMPLING DISTRIBUTION
The probability distribution of x is called its sampling distribution. It lists the various values
that x can assume and the probability of each value of x . In general; the probability distribution
of a sample statistic is called its sampling distribution
Reconsider the population of midterm scores of five students given in Table 2.1. Consider all
possible samples of three scores each that can be selected, without replacement, from that
population. The total number of possible samples, given by the combinations formula:
5!
5C3 10
3!(5 3)!
Suppose we assign the letters A, B, C, D, and E to the scores of the five students, so that
Lecture notes on Social statistics Page 11
A = 70, B = 78, C =80, D =80, E =95 Then, the 10 possible samples of three scores each are
ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE
These 10 samples and their respective means are listed in Table 3.3. Note that the first two
samples have the same three scores. The reason for this is that two of the students (C and D)
have the same score, and, hence, the samples ABC and ABD contain the same values. Note that
the values of the means of samples in Table 2.3 are rounded to two decimal places.
By using the values of given in Table 2.3, we record the frequency distribution of in Table 2.4.
By dividing the frequencies of the various values of by the sum of all frequencies, we obtain the
relative frequencies of classes, which are listed in the third column of Table 2.4.
These relative frequencies are used as probabilities and listed in Table 2.5. This table gives the
sampling distribution of x .
If we select just one sample of three scores from the population of five scores, we may draw any
of the 10 possible samples. Hence, the sample mean, can assume any of the values listed in Table
2.5 with the corresponding probability. For instance, the probability that the mean of a randomly
selected sample of three scores is 81.67 is .20. This probability can be written as
p( x 81.67) 0.20
EXERCISE
1) Tartus Industries has seven production employees (considered the population). The
hourly earnings of each employee are given in the table below
Definition
The mean and standard deviation of the sampling distribution of x are called the mean and
If we calculate the mean and standard deviation of the 10 values of x listed in Table 2.3, we
obtain the mean x , and the standard deviation x , of x . Alternatively, we can calculate the
mean and standard deviation of the sampling distribution of x listed in Table 2.5. These will also
x 80.60
x
1.
n
x (x x )2
3.30
2.
n
The mean of the sampling distribution of x is always equal to the mean of the population and
the standard deviation of the sampling distribution of x is smaller than the spread of the
corresponding population distribution.
a) x
0.05
b) x
n
if N
n
N n
c) x if
n
N 0.05
n N n
of the sampling distribution of ( x ) is referred from a very important theorem called the
Central limit theorem, which states that regardless of the shape of the distribution of the
population, the distribution of the sample means approaches the normal probability
distribution as the sample size increases .
1) Consider a large population with 60 , and 10 . Assuming 0.05 , find the mean
n
N
a) For a sample selected from this population 125 , and x 3.6 Find the sample
b) For a sample selected from this population 125 , and x 2.25 Find the
sample size.
Assume N 0.05 .
n
a) For a sample selected from this population 46 , and x 2 Find the sample size.
Assume N 0.05 .
n
12) The Old Farmer’s Almanac reports that the average person uses 123 gallons of water
daily. If the standard deviation is 21 gallons, find the probability that the mean of a
randomly selected sample of 15 people will be between 120 and 126 gallons. Assume the
variable is normally distributed
13) A. C. Neilsen reported that children between the ages of 2 and 5 watch an average of 25
hours of television per week. Assume the variable is normally distributed and the
standard deviation is 3 hours. If 20 children between the ages of 2 and 5 are randomly
selected, find the probability that the mean of the number of hours they watch television
will be greater than 26.3 hours.
The population and sample proportions, denoted by p and p̂ , respectively, are calculated as:
This section discusses the sampling distribution of the sample proportion and the mean, standard
deviation, and shape of this sampling distribution.
Sampling distribution of p̂
Definition: The probability distribution of the sample proportion p̂ , is called its sampling
distribution. It gives the various values that p̂ can assume and their probabilities.
The mean of the sample proportion, is denoted by p̂ and is equal to the population
proportion, p. Thus, p̂ p
Standard Deviation of the Sample Proportion The standard deviation of the sample
pq
p̂
n
where p is the population proportion, q 1 p , and n is the sample size. This formula is use
when N 0.05 , where N is the population size.
n
Introduction
Having discussed descriptive statistics: probability distributions and sampling distributions, we
are ready to tackle statistical inference. As we explained in basic statistics, statistical inference is
the process by which we acquire information and draw conclusions about populations from
samples. There are two general procedures for making inferences about populations: estimation
and hypothesis testing. In this chapter, we introduce the concepts and foundations of estimation
and demonstrate them with simple examples. In Chapter 4, we describe the fundamentals of
hypothesis testing. Because most of what we do in the remainder of this book applies the
concepts of estimation and hypothesis testing, understanding Chapters 4 and 5 is vital to your
development as a statistics practitioner.
In other words, inferential statistics uses the sample results to make decisions and draw
conclusions about the population from which the sample is drawn. Estimation is the first topic to
be considered in our discussion of inferential statistics. Estimation and hypothesis testing taken
together are usually referred to as inference making. This chapter explains how to estimate the
population mean and population proportion for a single population.
EXAMPLE
1) A survey of 30 emergency room patients found that the average waiting time for
treatment was 174.3 minutes. Assuming that the population standard deviation is 46.5
minutes.
a) Find the best point estimate of the population mean
b) Construct the 99% confidence interval of the population mean.
Solution:
46.5
n 30 x 174.3 $46.5 , x 8.5
n 30
a. Point estimate of x 174.3
b. 99% Confidence interval for is:
x z x
174.3 2.58(8.5)
174.3 21.93
Hence, one can be 99% confident that the mean waiting time for emergency room treatment is
between 152.4 and 196.2 minutes.
Characteristics of t-Distribution:
The t distribution was developed by W. S. Gosset in 1908 and published under the
pseudonym Student. As a result, the t distribution is also called Student’s t distribution.
The t distribution is similar to the standard normal distribution in these ways:
1) It is bell-shaped.
2) It is symmetric about the mean.
3) The mean, median, and mode are equal to 0 and are located at the center of the
distribution.
4) The curve never touches the x axis.
The t distribution differs from the standard normal distribution in the following ways:
1) The variance is greater than 1.
2) The t distribution is actually a family of curves based on the concept of degrees
of freedom, which is related to sample size.
3) As the sample size increases, the t distribution approaches the standard normal
distribution.
4) The number of degrees of freedom is the only parameter of the t distribution.
Example
1) Find the value of t for 16 degrees of freedom and .05 areas in the right tail of a t
distribution curve.
2) Find the value of t for the t distribution for each of the following
a) Area in the right tail 0.05 and df 12
b) Area in the left tail 0.05 and df 49
c) Area in the left tail 0.025 and n 66
d) Area in the right tail 0.005 and n 24
Examples
The American Sugar Producers Association wants to estimate the mean yearly sugar
consumption. A sample of 16 people reveals the mean yearly consumption to be 60 pounds with
a standard deviation of 20 pounds.
a. What is the value of the point estimate?
b. Develop the 95 percent confidence interval for the population mean.
Solution:
n 16 x 60 s 20 Confidence level=95%
s
df n 1 16 1 15 Area in each tail=0.025, t 2.131 sx 5
n
The 95% confidence interval for is:
x tsx
60 2.131(5)
60 10.66
Thus, we can state with 95% confidence that the mean yearly sugar consumption cholesterol
level for all Americans is between 49.34 and 70.66 pounds.
3) A random sample of 16 airline passengers at the Bay City airport showed that the mean
time spent waiting in line to check in at the ticket counters was 31 minutes with a
standard deviation of 7 minutes.
a) Construct a 99% confidence interval for the mean time spent waiting in line by all
passengers at this airport. Assume that such waiting times for all passengers are
normally distributed.
4) Social Networking Sites A recent survey of 8 social networking sites has a mean of 13.1
million visitors for a specific month. The standard deviation was 4.1 million.
a. Find the 95% confidence interval of the true mean
Estimation of a population proportion: Large Samples
This section explains how to estimate the population proportion p using the sample proportion.
We know that for large samples:
1) The sampling distribution of the sample proportion is (approximately) normal.
pq
p̂ Where q 1 p
n
greater than 5.If p and q are not known, then np and nq should each be greater than 5
for the sample to be large.
When estimating the value of a population proportion, we do not know the values of p
pˆ zs pˆ
Margin of error: The quantity that is subtracted from and added to the value of a sample
statistic to obtain a confidence interval for the corresponding population parameter.
Determining the Sample size for the Estimation of Proportion
z 2 pq
We know that : E z p̂ and finally we get n
E2
EXAMPLE
A sample of 300 observations taken from a population produced a sample proportion of .63. Make a 95%
confidence interval for p.
SOLUTION
pq (0.63)(0.37)
pˆ 0.63 qˆ 0.37 sp 0.0279
n 300
95% confidence interval for p is:
pˆ zs pˆ
0.63 1.96(0.0279)
0.63 0.5468
It would be wonderful if we could force both and to equal zero. Unfortunately, these
quantities have an inverse relationship. As increases, decreases and vice versa.
Rejection region In both tails In the left tail In the right tail
Acceptance region B/w critical values Right of the critical value Left of the
critical value
EXAMPLE:
1) Write the null and alternative hypotheses for each of the following examples. Determine
if each is a case of a two-tailed, a left-tailed, or a right-tailed test.
a) To test if the mean number of hours spent working per week by college students who
hold jobs is different from 20 hours
b) To test whether or not a bank’s ATM is out of service for an average of more than 10
hours per month
EXERCISE:
1) Write the null and alternative hypotheses for each of the following examples. Determine
if each is a case of a two-tailed, a left-tailed, or a right-tailed test.
a. To test if the mean length of experience of airport security guards is different
from 3 years
Lecture notes on Social statistics Page 38
b. To test if the mean credit card debt of college seniors is less than $1000
c. To test if the mean time a customer has to wait on the phone to speak to a
representative of a mail-order company about unsatisfactory service is more
than 12 minutes
d. An engineer hypothesizes that the mean number of defects can be decreased
in a manufacturing process of compact disks by using robots instead of
humans for certain tasks. The mean number of defective disks per 1000 is 18.
2) Consider H 0 : 55 versus H1 : 55
a) What type of error would you make if the null hypothesis is actually false and you fail
to reject it?
b) What type of error would you make if the null hypothesis is actually true and you
reject it?
x
z x
x Where
n
is called the test statistic. The test statistic can be defined as a rule or criterion that is used to
make the decision on whether or not to reject the null hypothesis.
Five-Step Procedure for Testing a Hypothesis
A test of hypothesis procedure that uses the critical-value approach involves the
following five steps:
1) State the null and alternative hypotheses.
2) Select the distribution to use.
3) Determine the rejection and nonrejection regions.
4) Calculate the value of the test statistic.
x
z 5.87
x
We reject H 0 and conclude that based on the sample information, it appears that the mean
length of all such calls is not equal to 12.44 minutes.
EXERCISE
2) Make the following hypothesis tests
a) H 0 : 25 , H1 : 25 n 81 x 28.5 3, 0.01 ,
c) H 0 : 40 , H1 : 40 n 100 x 47 7, 0.1 ,
7) A researcher wishes to test the claim that the average cost of tuition and fees at a four
year public college is greater than $5700. She selects a random sample of 36 four-year
public colleges and finds the mean to be $5950. The population standard deviation is
$659. Is there evidence to support the claim at 0.05 .
we use the normal distribution to perform a test of hypothesis about the population proportion,
p, for a large sample. As was mentioned in Chapters 6 and 7, in the case of a proportion, the
sample size is considered to be large when np and nq are both greater than 5.
The value of the test statistic z for the sample proportion, p̂ , is computed as
pˆ p pq
z where p̂
pˆ n
The value of p that is used in this formula is the one from the null hypothesis. The value of q
is equal to 1-p. The value of z calculated for pˆ using the above formula is also called the
observed value of z.
EXAMPLE
1) Direct Mailing Company sells computers and computer parts by mail. The company
claims that at least 90% of all orders are mailed within 72 hours after they are received.
The quality control department at the company often takes samples to check if this claim
is valid. A recently taken sample of 150 orders showed that 129 of them were mailed
within 72 hours.
a) Do you think the company’s claim is true? Use a 2.5% significance level.
2) A telephone company representative estimates that 40% of its customers have call-
waiting service. To test this hypothesis, she selected a sample of 100 customers and
found that 37% had call waiting. At a 0.01 , is there enough evidence to reject the
claim?
3) A researcher claims that 54% of fatal car/truck accidents are caused by driver error. A
researcher studies 30 randomly selected accidents and finds that 14 were caused by driver
error. Using a 0.05 , can the researcher’s claim be refuted?
4) A food company is planning to market a new type of frozen yogurt. However, before
marketing this yogurt, the company wants to find what percentage of the people like it.
The company’s management has decided that it will market this yogurt only if at least
TWO POPULATIONS
OBJECIVES
After completing this chapter, you should be able to
Test the difference between population means, using the z test.
Test the difference between two means for independent samples, using the t
test.
Test the difference between two means for dependent samples
Test the difference between two proportions
INTRODUCTION
Chapters 3 and 4 discussed the estimation and hypothesis-testing procedures for and p
involving a single population. This chapter extends the discussion of estimation and hypothesis-
testing procedures to the difference between two population means and the difference between
two population proportions. For example, we may want to make a confidence interval for the
difference between the mean prices of houses in and in or we may want to test the hypothesis
that the mean price of houses in Hargeisa is different from that in Erigavo. As another example,
we may want to make a confidence interval for the difference between the proportions of all
male and female adults who abstain from drinking, or we may want to test the hypothesis that the
proportion of all adult men who abstain from drinking is different from the proportion of all adult
women who abstain from drinking. Constructing confidence intervals and testing hypotheses
about population parameters are referred to as making inferences.
Let 1 be the mean of the first population and 2 be the mean of the second population. Suppose
we want to make a confidence interval and test a hypothesis about the difference between these
two population means, that is 1 2 . Let x be the mean of a sample taken from the first
population and be the mean of a sample taken from the second population. Then, x1 x2 is the
sample statistic that is used to make an interval estimate and to test a hypothesis about 1 2 .
This section discusses how to make confidence intervals and test hypotheses about
1 2 when certain conditions (to be explained later in this section) are satisfied.
First we explain the concepts of independent and dependent samples.
INDEPENDENT VS DEPENENDT SAMPLES
Two samples are independent if they are drawn from two different populations and the elements
of one sample have no relationship to the elements of the second sample. If the elements of the
two samples are somehow related, then the samples are said to be dependent. Thus, in two
independent samples, the selection of one sample has no effect on the selection of the second
sample.
EXAMPLE 1
Suppose we want to estimate the difference between the mean salaries of all male and all female
executives. To do so, we draw two samples, one from the population of male executives and
another from the population of female executives. These two samples are independent because
they are drawn from two different populations, and the samples have no effect on each other.
EXAMPLE 2
Suppose we want to estimate the difference between the mean weights of all participants before
and after a weight loss program. To accomplish this, suppose we take a sample of 40 participants
and measure their weights before and after the completion of this program. Note that these two
samples include the same 40 participants. This is an example of two dependent samples. Such
samples are also called paired or matched samples.
By constructing a confidence interval for 1 2 , we find the difference between the means
of two populations. For example, we may want to find the difference between the mean heights
of male and female adults. The difference between the two sample means, x1 x2 , is the point
estimator of the difference between the two population means, 1 2 . When the conditions
mentioned earlier in this section hold true, we use the normal distribution to make a confidence
interval for the difference between the two population means. The following formula gives the
interval estimation for 1 2 .
Sample
Sample
n2
n1
Sample Sample
statistics statistics
x1 and s1 x2 and s2
When using the normal distribution, the (1 )100% ) confidence interval for 1 2 is:
x1 x2 z x1 x2 where x x
12
22
1 2
n1 n2
1 2 0 .
3) Testing an alternative hypothesis that the mean of the first population is less than the
mean of the second population is equivalent to 1 2 which is the same as
1 2 0 .
The procedure followed to perform a test of hypothesis about the difference between two
population means is similar to the one used to test hypotheses about single-population
parameters in Chapter 5.
If the following conditions are satisfied, we will use the normal distribution to make a test
of hypothesis about
1) The two samples are independent.
2) The standard deviations and of the two populations are known.
3) At least one of the following two conditions is fulfilled:
a) Both samples are large (i.e., and
( x1 x2 ) ( 1 2 )
for is computed as z
x x
1 2
section
EXAMPLE:
1) Gasoline prices reached record high levels in 16 states during 2003 (The Wall Street
Journal,March 7, 2003). Two of the affected states were California and Florida. The
American Automobile Association reported a sample mean price of $2.04 per gallon in
California and a sample mean price of $1.72 per gallon in Florida. Use a sample size of
40 for the California data and a sample size of 35 for the Florida data. Assume that prior
studies indicate a population standard deviation of .10 in California and .08 in Florida.
a) What is a point estimate of the difference between the population mean
prices per gallon in California and Florida?
b) Construct a 99% confidence interval for? 1 2 ?
c) Test at the 1% significance level if the two population means are different.
Solution
a. The point estimate of 1 2 is: 1 2 x1 x2 0.32
x1 x2 z x1 x2
0.32 2.57(0.051)
0.32 0.132
( x1 x2 ) ( 1 2 ) 0.32 0
z 6.27
x x
1 2
0.051
We reject null hypothesis because the test statistic falls the rejection region
EXERCISE
1. The following information is obtained from two independent samples selected from two
normally distributed populations
n1 18, x1 7.82, 1 2.35 and n2 15, x2 5.99, 2 3.17
a) What is the point estimate of 1 2 ?
When the standard deviations of the two populations are equal, we can use for both 1
and 2 . Because is unknown, we replace it by its point estimator s p , which is called the
pooled sample standard deviation (hence, the subscript p). The value of sp is computed by
1 1
x1 x2 is: sx1 x2 sp
n1 n2
Now we are ready to discuss the procedures that are used to make confidence intervals and
test hypotheses about 1 2 for small and independent samples selected from two populations
with unknown but equal standard deviations.
INTERVAL ESTIMATION OF 1 2 :
As was mentioned earlier in this chapter, the difference between the two sample means, x1 x2 ,
is the point estimator of the difference between the two population means, 1 2 .
The following formula gives the confidence interval for 1 2 when the t distribution is used
and the conditions mentioned earlier in this section are fulfilled.
Confidence Interval for 1 2 :The ( (1 )100% confidence interval for 1 2 is
1 1
( x1 x2 ) tsx1 x2 Where sx1 x2 sp
n1 n2
HYPOTHESIS TESTING ABOUT 1 2
When the conditions mentioned in the beginning of the above Section are satisfied, the t
distribution is applied to make a hypothesis test about the difference between two population
means. The test statistic in this case is t, which is calculated as follows.
Test Statistic t for x1 x2 The value of the test statistic t for x1 x2 is computed as
( x1 x2 ) ( 1 2 )
t
sx1 x2
1) A consumer agency wanted to estimate the difference in the mean amounts of caffeine in
two brands of coffee. The agency took a sample of 15 one-pound jars of Brand I coffee
that showed the mean amount of caffeine in these jars to be 80 milligrams per jar with a
standard deviation of 5 milligrams. Another sample of 12 one-pound jars of Brand II
coffee gave a mean amount of caffeine equal to 77 milligrams per jar with a standard
deviation of 6 milligrams.
a) Construct a 95% confidence interval for the difference between the mean
amounts of caffeine in one-pound jars of these two brands of coffee.
Assume that the two populations are normally distributed and that the
standard deviations of the two populations are equal.
b) At the 1% significance level, can you conclude that the mean amounts of
caffeine in brand I are different for these in brand II?
Solution
( x1 x2 ) tsx1 x2
3 2.060(2.11565593)
3 4.36
0.01, 0.005,
2
A 0.005
df 25
tc 2.787
Step 4 calculate the test statistic
( x1 x2 ) ( 1 2 ) 30
t 1.418
sx1 x2 2.11565593
We don’t reject null hypothesis because the test statistic falls the non rejection region
1. A sample of 14 cans of Brand I diet soda gave the mean number of calories of 23 per can
with a standard deviation of 3 calories. Another sample of 16 cans of Brand II diet soda
gave the mean number of calories of 25 per can with a standard deviation of 4 calories.
a) At the 1% significance level, can you conclude that the mean numbers of calories
per can are different for these two brands of diet soda? Assume that the calories per
can of diet soda are normally distributed for each of the two brands and that the
standard deviations for the two populations are equal.
2. A sample of 40 children from New York State showed that the mean time they spend
watching television is 28.50 hours per week with a standard deviation of 4 hours. Another
sample of 35 children from California showed that the mean time spent by them watching
television is 23.25 hours per week with a standard deviation of 5 hours.
a) Using a 2.5% significance level, can you conclude that the mean time spent
watching television by children in New York State is greater than that for
children in California? Assume that the standard deviations for the two
populations are equal.
3. The following information was obtained from two independent samples selected from
two normally distributed populations with unknown but equal standard deviations.
n1 21, x1 13.97, s1 3.78 and n2 20, x2 15.55, s2 3.26
a) What is the point estimate of 1 2 ?
At the .01 significance level, is the mean grade of the women higher than that of the
men?
6. Ms. Lisa Monnin is the budget director for Nexus Media, Inc. She would like to compare
the daily travel expenses for the sales staff and the audit staff. She collected the following
sample information.
At the .10 significance level, can she conclude that the mean daily expenses are greater
for the sales staff than the audit staff?
Mean and Standard Deviation of the Paired Differences for Two Samples The values of the
mean and standard deviation d , and sd , respectively, of paired differences for two samples are
calculated as:
d d n
2 ( d )2
d sd 2
n n 1
In paired samples, instead of using x1 x2 as the sample statistic to make inferences about
1 2 , we use the sample statistic d to make inferences about d . Actually the value of d is
sd
d tsd where sd
n
n 1 degrees of freedom,
Hypothesis Testing About d
A hypothesis about d is tested by using the sample statistic This section illustrates the case
of the t distribution only. Earlier in this section we learned what conditions should hold true to
use the t distribution to test a hypothesis about d . The following formula is used to calculate
the value of the test statistic t when testing a hypothesis about d .
Test Statistic t for The value of the test statistic t for is computed as follows:
d d
t
sd
EXAMPLES:
1) A company wanted to know if attending a course on “how to be a successful salesperson”
can increase the average sales of its employees. The company sent six of its salespersons
to attend this course. The following table gives the 1-week sales of these salespersons
before and after they attended this course.
Before 12 18 25 9 14 16
After 18 24 24 14 19 20
d -6 6 1 -5 -5 -4
d 13
d2
36 36 1 25 25 16
d 139
( d )2
d
d
13
2.17 sd
d 2
n
139 28.17
4.71
n 6 n 1 5
sd 4.71
sd 1.92 ,
n 6
A 0.025
df 5
t 2.5706
Therefore, the 95% confidence interval for d is
d tsd
2.17 2.5706(1.92)
2.17 4.94
H o : d 0
b. Step 1
H1 : d 0
Step 2 t-distribution
Step 3 Determine rejection and nonrejection regions
d d 2.17 0
t 1.13
sd 1.92
Step 5: make decision : we do not reject null hypothesis
EXERCISE
1. Find the following confidence intervals for d , assuming that the populations of
paired differences are normally distributed.
Before 8 5 4 9 6 9 5
After 10 8 5 11 6 7 9
a) Construct a 95% confidence interval for the mean _d of the population paired differences,
where a paired difference is equal to the score of an employee before attending the
course minus the score of the same employee after attending the course.
b) Test at the 1% significance level whether attending this course increases the mean score
of employees. Assume that the population of paired differences has a normal distribution.
4) A researcher wanted to find the effect of a special diet on systolic blood pressure. She
selected a sample of seven adults and put them on this dietary plan for 3 months. The
following table gives the systolic blood pressures (in mm Hg) of these seven adults
before and after the completion of this plan.
Before 210 180 195 220 231 199 224
After 193 186 186 223 220 183 233
Let d be the mean reduction in the systolic blood pressures due to this special dietary
plan for the population of all adults.
a) Construct a 95% confidence interval for _d. Assume that the population of paired
differences is (approximately) normally distributed.
b) Using the 5% significance level, can you conclude that the mean of the paired differences
At the .01 significance level is it reasonable to conclude that the modification reduced the
number of traffic accidents?
the alternative hypothesis H1 : not all three population means are equal
We use the analysis of variance procedure to perform such a test of hypothesis.
Note that the analysis of variance procedure can be used to compare two population means.
However, the procedures learned in Chapter 6 are more efficient for performing tests of
hypothesis about the difference between two population means; the analysis of variance
procedure, to be discussed in this chapter, is used to compare three or more population means.
Lecture notes on Social statistics Page 66
An analysis of variance test is performed using the F distribution. First, the F distribution is
described in Section 7.1 of this chapter. Then, Section 7.2 discusses the application of the one-
way analysis of variance procedure to perform tests of hypothesis.
The F Distribution
Like the chi-square distribution, the shape of a particular F distribution1 curve depends on the
number of degrees of freedom. However, the F distribution has two numbers of degrees of
freedom: degrees of freedom for the numerator and degrees of freedom for the denominator.
These two numbers representing two types of degrees of freedom are the parameters of the F
distribution. Each combination of degrees of freedom for the numerator and for the denominator
gives a different F distribution curve. The units of an F distribution are denoted by F, which
assumes only nonnegative values. Like the normal, t, and chi-square distributions, the F
distribution is a continuous distribution. The shape of an F distribution curve is skewed to the
right, but the skewness decreases as the number of degrees of freedom increases.
Characteristics of the F Distribution
1) The F distribution is continuous and skewed to the right.
2) The F distribution has two numbers of degrees of freedom: df for the numerator and df
for the denominator.
3) The units of an F distribution, denoted by F, are nonnegative.
For an F distribution, degrees of freedom for the numerator and degrees of freedom for the
EXAMPLE
1) Find the F value for 8 degrees of freedom for the numerator, 14 degrees of freedom for
the denominator, and .05 area in the right tail of the F distribution curve
2) Find the critical value of F for the following.
a) df (3,3) and area in the right tail 0.05
b) df (3,10) and area in the right tail 0.05
c) df (3,30) ) and area in the right tail 0.05
EXERCISE
3) Find the critical value of F for the following.
d) df (2, 6) and area in the right tail 0.25
e) df (6, 6) and area in the right tail 0.25
f) df (15, 6) ) and area in the right tail 0.25
and H 0 : 2 3 separately using the procedure discussed in Chapter 6. Besides being time
consuming, such a procedure has other disadvantages. First, if we reject even one of these three
hypotheses, then we must reject the null hypothesis H 0 : 1 2 3 . Second, combining the
Type I error probabilities for the three tests (one for each test) will give a very large Type I error
probability for the test H 0 : 1 2 3 . Hence, we should prefer a procedure that can test
the equality of three means in one test. The ANOVA, short for analysis of variance, provides
such a procedure. It is used to compare three or more population means in a single test.
DEFINITION
ANOVA: is a procedure used to test the null hypothesis that the means of three or more
populations are all equal.
This section discusses the one-way ANOVA procedure to make tests by comparing the means
of several populations. By using a one-way ANOVA test, we analyze only one factor or variable.
For instance, in the example of testing for the equality of mean arithmetic scores of students
taught by each of the three different methods, we are considering only one factor, which is the
effect of different teaching methods on the scores of students. Sometimes we may analyze the
effects of two factors. For example, if different teachers teach arithmetic using these three
methods, we can analyze the effects of teachers and teaching methods on the scores of students.
This is done by using a two-way ANOVA. The procedure under discussion in this chapter is
To calculate MSB and MSW, we first compute the between-samples sum of squares, denoted
by SSB, and the within-samples sum of squares, denoted by SSW. The sum of SSB
and SSW is called the total sum of squares and is denoted by SST; that is,
SST=SSB+SSW
The values of SSB and SSW are calculated using the following formulas.
T12 T2 2 T32
SSW x 2
........
n1 n1 n1
T1 324, T2 369, T3 388
n1 5, n2 5, n3 5
n 15
x 1081, x
x 80709
2
SSB 432.1333
SSW 2372.8000
SST 2804.9333
SSB 432.1333
MSB 216.0667 Where k-1 is df for numerator
k 1 3 1
SSW 2372.8000
MSW 197.7333 Where n-k is df for denominator
nk 15 3
H 0 : 1 2 3
H1 : Not all three means are equal
F distribution
df n k 1 2
df den n k 12 F 1.09 We don’t reject null hypothesis
F 6.93
EXERCISE:
1) From time to time, unknown to its employees, the research department at Post Bank observes
various employees for their work productivity. Recently this department wanted to check
whether the four tellers at a branch of this bank serve, on average, the same number of customers
per hour. The research manager observed each of the four tellers for a certain number of hours.
The following table gives the number of customers served by the four tellers during each of the
observed hours.
Teller A Teller B Teller C Teller D
19 14 11 24
21 16 14 19
26 14 21 21
24 13 13 26
18 17 16 20
13 18
a) At the 5% significance level, test the null hypothesis that the mean number of
customers served per hour by each of these four tellers is the same. Assume that all the
assumptions required to apply the one-way ANOVA procedure hold true.
3) Consider the following data obtained for two samples selected at random from two
populations that are independent and normally distributed with equal variances.
Sample I 32 26 31 20 27 34
Sample II 27 35 33 40 38 31
a) Calculate the means and standard deviations for these samples using the formulas from
Chapter3
b) Using the one-way ANOVA procedure, test at the 1% significance level whether the
means of the populations from which these samples are drawn are equal.
4) The following ANOVA table, based on information obtained for three samples selected
from three independent populations that are normally distributed with equal variances, has a
few missing values.
Source of Degrees of freedom Sum of squares Mean squares Value of test statistic
variation
REFERENCES