Chapter 8
Chapter 8
Contents 1
1
Chapter 8
Suppose you wanted to determine whether the mean level of a driver’s blood alcohol exceeds the legal limit after two drinks,
or whether the majority of registered voters approve of the president’s performance. In both cases, you are interested in
making an inference about how the value of a parameter relates to a specific numerical value. Is it less than, equal to, or
greater than the specified number? This type of inference, called a test of hypothesis, is the subject of this chapter.
Suppose we are interested in making an inference about the mean µ of a population. However, we are less interested in
estimating the value of µ than we are in testing a hypothesis about its value—that is, we want to decide whether the mean
of the ppopulation is less than, equal to, or greater than the specified number.
1- The null hypothesis, denoted H0 , represents the hypothesis that will be assumed to be true unless the data provide
convincing evidence that it is false. This usually represents the “status quo” or some statement about the population
2- The alternative (research) hypothesis, denoted Ha (H1 ), represents the hypothesis that will be accepted only if the
data provide convincing evidence of its truth. This usually represents the values of a population parameter for which
The null and alternative hypotheses for instance can be stated as follows:
H0 : µ ≤ 2400
Ha : µ > 2400
Because the hypotheses concern the value of the population mean µ, it is reasonable to use the sample mean x̄ to make the
inference, just as we did when we formed confidence intervals for µ in Sections 7.2 and 7.3. In example above, “Convincing”
evidence in favor of the alternative hypothesis will exist when the value of x̄ exceeds 2,400 by an amount that cannot be
readily attributed to sampling variability. To decide, we compute a test statistic, i.e., a numerical value computed from the
sample.
2
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 3
The test statistic is a sample statistic, computed from information provided in the sample, that the researcher uses to
A Type I error occurs if the researcher rejects the null hypothesis in favor of the alternative hypothesis when, in fact, H0
α = P (Type I error) = P (Rejecting the null hypothesis when in fact the null hypothesis is true)
In our example,
H0 : µ ≤ 2400
Ha : µ > 2400
x̄−2400
Test statistic: z = σx̄
The rejection region of a statistical test is the set of possible values of the test statistic for which the researcher will reject
H0 in favor of Ha .
Suppose: x̄ = 2, 460, n = 50, and s = 200 as in the case of estimation, we can use s to approximate σ when s is calculated
Because this value of z exceeds 1.645, it falls into the rejection region. That is, we reject the null hypothesis that µ = 2, 400
The answer is α = .05—that is, we selected the level of risk, α, of making a Type I error when we constructed the test.
A Type II error occurs if the researcher accepts the null hypothesis when, in fact, H0 is false. The probability of committing
Figure 8.2: Location of the test statistic for a test of the hypothesis H0 : µ = 2,400
It is often difficult to determine β precisely. Rather than make a decision (accept H0 ) for which the probability of error β
is unknown, we avoid the potential Type II error by avoiding the conclusion that the null hypothesis is true. Instead, we will
simply state that the sample evidence is insufficient to reject H0 at α = .05. Because the null hypothesis is the “status-quo”
hypothesis, the effect of not rejecting H0 is to maintain the status quo. Therefore:
Because α is usually specified by the analyst, we will generally be able to reject H0 (accept Ha ) when the sample evidence
supports that decision. However, because β is usually not specified, we will generally avoid the decision to accept H0 ,
preferring instead to state that the sample evidence is insufficient to reject H0 when the test statistic is not in the rejection
region.
Following table summarizes the four possible outcomes (i.e., conclusions) of a test of hypothesis. The “true state of
nature” columns refer to the fact that either the null hypothesis H0 is true or the alternative hypothesis Ha is true.
Note
Note that a Type I error can be made only when the null hypothesis is rejected in favor of the alternative hypothesis,
and a Type II error can be made only when the null hypothesis is accepted.
Note
The elements of a test of hypothesis are summarized as follows. Note that the first four elements are all specified before
the sampling experiment is performed. In no case will the results of the sample be used to determine the hypotheses; the
data are collected to test the predetermined hypotheses, not to formulate them.
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 5
1- Null hypothesis (H0 ): A theory about the specific values of one or more population parameters. The theory generally
represents the status quo, which we adopt until it is proven false. The theory is always stated as
H0 : parameter = value.
2- Alternative (research) hypothesis (Ha ): A theory that contradicts the null hypothesis. The theory generally represents
that which we will adopt only when sufficient evidence exists to establish its truth.
3- Test statistic: A sample statistic used to decide whether to reject the null hypothesis.
4- Rejection region: The numerical values of the test statistic for which the null hypothesis will be rejected. The rejection
region is chosen so that the probability is α that it will contain the test statistic when the null hypothesis is true,
thereby leading to α Type I error. The value of α is usually chosen to be small (e.g., .01, .05, or .10) and is referred to
5- Assumptions: Clear statement(s) of any assumptions made about the population(s) being sampled.
6- Experiment and calculation of test statistic: Performance of the sampling experiment and determination of the numerical
7- Conclusion:
a. If the numerical value of the test statistic falls in the rejection region, we reject the null hypothesis and conclude that
the alternative hypothesis is true. We know that the hypothesis-testing process will lead to this conclusion incorrectly
b. If the test statistic does not fall in the rejection region, we do not reject H0 . Thus, we reserve judgment about
which hypothesis is true. We do not conclude that the null hypothesis is true because we do not (in general) know the
probability β that our test procedure will lead to an incorrect acceptance of H0 (Type II error).
1- Select the alternative hypothesis as that which the sampling experiment is intended to establish. The alternative
c. Two-tailed (Ha : µ 6= µ0 )
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 6
2- Select the null hypothesis as the status quo, that which will be presumed true unless the sampling experiment conclu-
sively establishes the alternative hypothesis. The null hypothesis will be specified as that parameter value closest to the
alternative in one-tailed tests and as the complementary (or only unspecified) value in two-tailed tests. (H0 : µ = µ0 )
A one-tailed test of hypothesis is one in which the alternative hypothesis is directional and includes the symbol “<” or “>”.
A two-tailed test of hypothesis is one in which the alternative hypothesis does not specify departure from H0 in a particular
Example 8.2.1. it is producing machine bearings with a mean diameter of .5 inch. If the mean diameter of the bearings is
larger or smaller than .5 inch, then the process is out of control and must be adjusted. Formulate the null and alternative
hypotheses for a test to determine whether the bearing production process is out of control.
Solution
The hypotheses must be stated in terms of a population parameter. Here, we define µ as the true mean diameter (in
inches) of all bearings produced by the metal lathe. If either µ > .5 or µ < .5, then the lathe’s production process is out of
control.
Note
Whenever a claim is made about the value of a particular population parameter and the researcher wants to test the
claim, believing that it is false, the claimed value will represent the null hypothesis.
Rejection Region
The rejection region for a two-tailed test differs from that for a one-tailed test. When we are trying to detect departure
from the null hypothesis in either direction, we must establish a rejection region in both tails of the sampling distribution of
Note that the smaller α you select, the more evidence (the larger z) you will need before you can reject H0 .
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 7
Alternative Hypotheses
Lower-Tailed Upper- Two-Tailed
Tailed
α = 0.1 z < −1.28 z > 1.28 z < −1.645 or z > 1.645
α = 0.05 z < −1.645 z > 1.645 z < −1.96 or z > 1.96
α = 0.01 z < −2.33 z > 1.28 z < −2.575 or z > 2.575
Example 8.2.2. The effect of drugs and alcohol on the nervous system has been the subject of considerable research.
Suppose a research neurologist is testing the effect of a drug on response time by injecting 100 rats with a unit dose of the
drug, subjecting each rat to a neurological stimulus, and recording its response time. The neurologist knows that the mean
response time for rats not injected with the drug (the “control” mean) is 1.2 seconds. She wishes to test whether the mean
response time for drug-injected rats differs from 1.2 seconds. Set up the test of hypothesis for this experiment, using α = .01.
Solution The key word mean in the statement of the problem implies that the target parameter is µ, the mean response
Ha : µ 6= 1.2 (Mean response time is less than 1.2 or greater than 1.2 seconds
x̄−1.2
Test statistic: σx̄
Assumptions: Since the sample size of the experiment is large enough (n > 30), the Central Limit Theorem will apply,
and no assumptions need be made about the population of response time measurements. The sampling distribution of the
sample mean response of 100 rats will be approximately normal, regardless of the distribution of the individual rats’ response
times.
Note that the test is set up before the sampling experiment is conducted.
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 8
According to the statistical test procedure described in Section 8.2, the rejection region and, correspondingly, the value of
α are selected prior to conducting the test, and the conclusions are stated in terms of rejecting or not rejecting the null
hypothesis. A second method of presenting the results of a statistical test is based on the observed significance level (or
The observed significance level, or p-value, for a specific statistical test is the probability (assuming H0 is true) of observing
a value of the test statistic that is at least as contradictory to the null hypothesis, and supportive of the alternative hypothesis,
1. Determine the value of the test statistic z corresponding to the result of the sampling experiment.
2. a. If the test is one-tailed, the p-value is equal to the tail area beyond z in the same direction as the alternative
hypothesis. Thus, if the alternative hypothesis is of the form > , the p-value is the area to the right of, or above, the
ob- served z-value. Conversely, if the alternative is of the form < , the p-value is the area to the left of, or below, the
observed z-value.
b. If the test is two-tailed, the p-value is equal to twice the tail area beyond the observed z-value in the direction
of the sign of z—that is, if z is positive, the p-value is twice the area to the right of, or above, the observed z-value.
Conversely, if z is negative, the p-value is twice the area to the left of, or below, the observed z-value. See the following
figures:
Figure 8.6: Finding the p-value for a two-tailed test: p-value =2( p2 )
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 9
2. If the observed significance level (p-value) of the test is less than the chosen value of α, reject the null hypothesis.
Note: Some statistical software packages (e.g., SPSS) will conduct only two-tailed tests of hypothesis. For these packages,
you obtain the p-value for a one-tailed test as shown in the box:
Reported p-value
a. p = 2
(
Ha is of form > and z is positive
if (8.1)
Ha is of form < and z is negative
b. p = 1 − ( Reported2 p-value )
(
Ha is of form > and z is negative
if (8.2)
Ha is of form < and z is positive
Example 8.3.1. Consider the one-tailed test of hypothesis, H0 : µ = 100 versus Ha : µ > 100.
a. Suppose the test statistic is z = 1.44. Find the p-value of the test and the rejection region for the test when α = .05.
Then show that the conclusion using the rejection region approach will be identical to the conclusion based on the
p-value.
b. Now suppose the test statistic is z = 3.01; find the p-value and rejection region for the test when α = .05. Again, show
that the conclusion using the rejection region approach will be identical to the conclusion based on the p-value.
Solution
This p-value is shown on Figure 8.7. Since α = .05 and the test is upper-tailed, the rejection region for the test is
z > 1.645. This rejection region is also shown in Figure 8.7. Observe that the test statistic (z = 1.44) falls outside the
rejection region, implying that we fail to reject H0 . Also, α = .05 is less than p-value = .075. This result also implies that
we should fail to reject H0 . Consequently, both decision rules agree—there is insufficient evidence to reject H0 .
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 10
The test statistic (z = 3.01) falls within the rejection region, leading us to reject H0 . And, α = .05 now exceeds the
When testing a hypothesis about a population mean µ, the test statistic we use will depend on whether the sample size n
is large (say,n > 30) or small and whether we know the value of the population standard deviation, σ. In this section, we
consider the large-sample case. Therefore, according to Chapter 7, the test statistic for a test based on large samples will
be based on the normal z-statistic and the sample standard deviation s provides a good approximation to σ in canse it is
unknown.
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 11
x̄−µ0
zc = √σ
n
x̄−µ0
zc = √s
n
Lower-Tailed Tests
H0 : µ = µ0
Ha : µ < µ0
p-value: P (z < zc )
Upper-Tailed Tests
H0 : µ = µ0
Ha : µ > µ0
p-value: P (z > zc )
Two-Tailed Tests
H0 : µ = µ0
Ha : µ 6= µ0
Decision
Reject H0 if p − value < α or if test statistic (zc ) falls in rejection region ( the result is “statistically significant”) where:
α
P (z > zα ) = α, P (z > z α2 ) = 2
1. If the calculated test statistic falls in the rejection region, reject H0 and conclude that the alternative hypothesis Ha
is true. State that you are rejecting H0 at the α level of significance. Remember that the confidence is in the testing
2. If the test statistic does not fall in the rejection region, conclude that the sampling experiment does not provide
sufficient evidence to reject H0 at the α level of significance. [Generally, we will not “accept” the null hypothesis unless
Example 8.4.1. Refer to the neurological response-time test setup in Example 8.2.2. The sample of 100 drug-injected rats
yielded the results (in seconds) in Data Set: DRUGRAT. At α = .01, use these data to conduct the test of hypothesis,
Ha : µ 6= 1.2 (Mean response time is less than 1.2 or greater than 1.2 seconds
Solution
Based on the datd set: x̄ = 1.0517, s = .4982. Now we substitute these sample statistics into the test statistic and obtain
x̄−µ0 1.0517−1.2
zc = √s
= 0.4982
√
= −2.98
n 100
The figure above shows taht z = −2.98 falls in the lower-tail rejection region, which consists of all values of z < −2.575.
Therefore, this sampling experiment provides sufficient evidence to reject H0 and conclude, at the α = 0.01 level of significance,
that the mean response time for drug-injected rats differs from the control mean of 1.2 seconds. It appears that the rats
receiving an injection of the drug have a mean response time that is less than 1.2 seconds.
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 13
Known standard deviation: 0.4982 (calculate StDev prior to this step using Basic Statistic)
→ options
confidence level: 99
→ ok → ok
Note
The level of α is determined before the sampling experiment is performed. If we decide that we are willing to tolerate a
1% Type I error rate, the result of the sampling experiment should have no effect on that decision. In general, the same data
Question
Recall from Section 7.3 that when we are faced with making inferences about a population mean from the information in
a small sample so that two problems emerged. Therefore, we defined and used the t- statistic which follows t-distribution.
Therefore, as the test statistic of a small-sample test of a population mean, we use the t-statistic.
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 14
Test statistic:
x̄−µ0
tc = √s
n
Lower-Tailed Tests
H0 : µ = µ0
Ha : µ < µ0
p-value: P (t < tc )
Upper-Tailed Tests
H0 : µ = µ0
Ha : µ > µ0
p-value: P (t > tc )
Two-Tailed Tests
H0 : µ = µ0
Ha : µ 6= µ0
Decision
Reject H0 if p − value < α or if test statistic (tc ) falls in rejection region ( the result is “statistically significant”) where:
α
P (t > tα ) = α, P (t > t α2 ) = 2
2. The population from which the sample is selected has a distribution that is approximately normal.
Example 8.5.1. A major car manufacturer wants to test a new engine to determine whether it meets new airpollution
standards. The mean emission µ of all engines of this type must be less than 20 parts per million of carbon. Ten engines are
manufactured for testing purposes, and the emission level of each is determined. The data (in parts per million) are listed in
Data Set: ENGINE. Do the data supply sufficient evidence to allow the manufacturer to conclude that this type of engine
meets the pollution standard? Assume that the manufacturer is willing to risk a Type I error with probability α = .01.
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 15
Solution
Rejection region: For α = .01 and df = n - 1 = 9, the one-tailed rejection region (see Figure 8.10) is t < −t0.01 = −2.821.
Figure 8.11: A t-distribution with 9 df and the rejection region for Example 8.5.1.
Since the calculated t falls outside the rejection region (see Figure 8.10)), the manufacturer cannot reject H0 . There is
insufficient evidence to conclude that µ < 20 parts per million and that the new type of engine meets the pollution standard.
The p-value (0.0143691) is greater than α = 0.01. Thus, the two methods agree and we cannot reject H0 : µ = 20 in favor
of Ha : µ < 20.
Hypothesized mean: 20
→ options
confidence level: 99
→ ok → ok
Question
Inferences about population proportions (or percentages) are often made in the context of the probability p of “success” for
a binomial distribution. We saw how to use large samples from binomial distributions to form confidence intervals for p in
Section 7.4. We now consider tests of hypotheses about p. Recall that the sample proportion p̂ is really just the sample
mean of the outcomes of the individual binomial trials and, as such, is approximately normally distributed (for large samples)
according to the Central Limit Theorem. Thus, for large samples we can use the standard normal z as the test statistic:
Test statistic:
Lower-Tailed Tests
H0 : p = p0
Ha : p < p0
p-value: P (z < zc )
Upper-Tailed Tests
H0 : p = p0
Ha : p > p0
p-value: P (z > zc )
Two-Tailed Tests
H0 : p = p0
H1 : p 6= p0
Decision
Reject H0 if p − value < α or if test statistic (zc ) falls in rejection region ( the result is “statistically significant”) where:
α
P (z > zα ) = α, P (z > z α2 ) = 2
2. The sample size n is large. (This condition will be satisfied if both np0 ≥ 15 and nq0 ≥ 15.)
Example 8.6.1. Consider a method currently used by doctors to screen women for breast cancer. The method fails to detect
cancer in 20% of the women who actually have the disease. Suppose a new method has been developed that researchers hope
will detect cancer more accurately. This new method was used to screen a random sample of 140 women known to have
breast cancer. Of these, the new method failed to detect cancer in 12 women. Does this sample provide evidence that the
failure rate of the new method differs from the one currently in use? (α = 0.05)
Considering the results above and that the sample was selected at random, we can conclude the conditions are met.
(p̂−p0 )
Test statistic: zc = √ p0 q0 =
√.086−.2
0.2×0.8
= −3.35
n 140
You can see that the test statistic falls in rejection region. Therefore, we reject the null hypothesis, concluding at the
.05 level of significance that the true failure rate of the new method for detecting breast cancer differs from .20. The same
The p-value (0.001) is less than α = 0.05. Thus, the two methods agree and we reject H0 : p = p0 in favor of H0 : p 6= p0 .
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 18
Summarized data
Number of events: 12
→ options
confidence level: 95
→ ok → ok
CHAPTER 8. INFERENCES BASED ON A SINGLE SAMPLE: TESTS OF HYPOTHESIS 19
Small samples
Since most surveys and studies employ large samples, the large-sample testing procedure based on the normal (z) statistic
presented here will be appropriate for making inferences about a population proportion. However, in the case of small
samples (where either np0 or nq0 is less than 15), tests for a population proportion based on the z-statistic may not be
valid—especially when conducting one-tailed tests. A test of proportions that can be applied to small samples utilizes the
binomial, rather than the normal, distribution. These are called exact binomial tests due to the fact that the exact (rather
than approximate) p-value for the test is computed based on the binomial distribution.
The core content of the slides are from the textbook of this course;
by