Hypothesis Testing
Hypothesis Testing
Tailed Tests
The procedure for hypothesis testing is based on the ideas described above. Specifically, we set
up competing hypotheses, select a random sample from the population of interest and compute
summary statistics. We then determine whether the sample data supports the null or alternative
hypotheses. The procedure can be broken down into the following five steps.
The test statistic is a single number that summarizes the sample information. An example of a
test statistic is the Z statistic computed as follows:
When the sample size is small, we will use t statistics (just as we did when constructing
confidence intervals for small samples). As we present each scenario, alternative test statistics
are provided along with conditions for their appropriate use.
The decision rule is a statement that tells under what circumstances to reject the null hypothesis.
The decision rule is based on specific values of the test statistic (e.g., reject H 0 if Z > 1.645). The
decision rule for a specific test depends on 3 factors: the research or alternative hypothesis, the
test statistic and the level of significance. Each is discussed below.
1. The decision rule depends on whether an upper-tailed, lower-tailed, or two-tailed test is
proposed. In an upper-tailed test the decision rule has investigators reject H 0 if the test
statistic is larger than the critical value. In a lower-tailed test the decision rule has
investigators reject H0 if the test statistic is smaller than the critical value. In a two-tailed
test the decision rule has investigators reject H0 if the test statistic is extreme, either larger
than an upper critical value or smaller than a lower critical value.
2. The exact form of the test statistic is also important in determining the decision rule. If
the test statistic follows the standard normal distribution (Z), then the decision rule will
be based on the standard normal distribution. If the test statistic follows the t distribution,
then the decision rule will be based on the t distribution. The appropriate critical value
will be selected from the t distribution again depending on the specific alternative
hypothesis and the level of significance.
3. The third factor is the level of significance. The level of significance which is selected in
Step 1 (e.g., =0.05) dictates the critical value. For example, in an upper tailed Z test, if
=0.05 then the critical value is Z=1.645.
The following figures illustrate the rejection regions defined by the decision rule for upper-,
lower- and two-tailed Z tests with =0.05. Notice that the rejection regions are in the upper,
lower and both tails of the curves, respectively. The decision rules are written below each figure.
Low
er-
Taile
d
Test
a Z
0. -
Rejection Region for Upper-Tailed Z Test (H1: > 0 ) with
10 1.
=0.05
2
8
The decision rule is: Reject H0 if Z > 1.645.
2
0. -
05 1.
6
4
5
0. -
02 1.
5 9
6
0
0. -
01 2.
0 3
2
6
0. -
00 2.
5 5
7
6
0. -
00 3.
1 0
9
0
0. -
00 3.
01 7
1
9
Uppe
r-
Taile
d
Test
0. 1.
Rejection Region for Lower-Tailed Z Test (H1: < 0 ) with 10 2
=0.05 8
2
The decision rule is: Reject H0 if Z < 1.645.
0. 1.
05 6
4
5
0. 1.
02 9
5 6
0
0. 2.
01 3
0 2
6
0. 2.
00 5
5 7
6
0. 3.
00 0
1 9
0
0. 3.
00 7
01 1
9
Two-
Taile
d
Test
0. 1.
Rejection Region for Two-Tailed Z Test (H1: 0 ) with 20 2
=0.05 8
2
The decision rule is: Reject H0 if Z < -1.960 or if Z > 1.960.
0. 1.
10 6
4
5
0. 1.
05 9
6
0
0. 2.
01 5
0 7
6
0. 3.
00 2
1 9
1
0. 3.
00 8
01 1
9
The complete table of critical values of Z for upper, lower and two-tailed tests can be found in
the table of Z values to the right in "Other Resources."
Critical values of t for upper, lower and two-tailed tests can be found in the table of t values in
"Other Resources."
Here we compute the test statistic by substituting the observed sample data into the test statistic
identified in Step 2.
Step 5. Conclusion.
The final conclusion is made by comparing the test statistic (which is a summary of the
information observed in the sample) to the decision rule. The final conclusion will be either to
reject the null hypothesis (because the sample data are very unlikely if the null hypothesis is true)
or not to reject the null hypothesis (because the sample data are not very unlikely).
If the null hypothesis is rejected, then an exact significance level is computed to describe the
likelihood of observing the sample data assuming that the null hypothesis is true. The exact level
of significance is called the p-value and it will be less than the chosen level of significance if we
reject H0.
Statistical computing packages provide exact p-values as part of their standard output for
hypothesis tests. In fact, when using a statistical computing package, the steps outlined about can
be abbreviated. The hypotheses (step 1) should always be set up in advance of any analysis and
the significance criterion should also be determined (e.g., =0.05). Statistical computing
packages will produce the test statistic (usually reporting the test statistic as t) and a p-value. The
investigator can then determine statistical significance using the following: If p < then reject
H0.
Things to Remember When Interpreting P Values
The research hypothesis is that weights have increased, and therefore an upper tailed test is used.
Because the sample size is large (n>30) the appropriate test statistic is
In this example, we are performing an upper tailed test (H 1: > 191), with a Z test statistic and
selected =0.05. Reject H0 if Z > 1.645.
We now substitute the sample data into the formula for the test statistic identified in Step 2.
Step 5. Conclusion.
We reject H0 because 2.38 > 1.645. We have statistically significant evidence at a =0.05, to show
that the mean weight in men in 2006 is more than 191 pounds. Because we rejected the null
hypothesis, we now approximate the p-value which is the likelihood of observing the sample
data if the null hypothesis is true. An alternative definition of the p-value is the smallest level of
significance where we can still reject H0. In this example, we observed Z=2.38 and for =0.05,
the critical value was 1.645. Because 2.38 exceeded 1.645 we rejected H 0. In our conclusion we
reported a statistically significant increase in mean weight at a 5% level of significance. Using
the table of critical values for upper tailed tests, we can approximate the p-value. If we select
=0.025, the critical value is 1.96, and we still reject H 0 because 2.38 > 1.960. If we select
=0.010 the critical value is 2.326, and we still reject H 0 because 2.38 > 2.326. However, if we
select =0.005, the critical value is 2.576, and we cannot reject H 0 because 2.38 < 2.576.
Therefore, the smallest where we still reject H 0 is 0.010. This is the p-value. A statistical
computing package would produce a more precise p-value which would be in between 0.005 and
0.010. Here we are approximating the p-value and would report p < 0.010.
In the first step of the hypothesis test, we select a level of significance, , and = P(Type I error).
Because we purposely select a small value for , we control the probability of committing a Type
I error. For example, if we select =0.05, and our test tells us to reject H 0, then there is a 5%
probability that we commit a Type I error. Most investigators are very comfortable with this and
are confident when rejecting H0 that the research hypothesis is true (as it is the more likely
scenario when we reject H0).
When we run a test of hypothesis and decide not to reject H0 (e.g., because the test statistic is
below the critical value in an upper tailed test) then either we make a correct decision because
the null hypothesis is true or we commit a Type II error. Beta () represents the probability of a
Type II error and is defined as follows: =P(Type II error) = P(Do not Reject H 0 | H0 is false).
Unfortunately, we cannot choose to be small (e.g., 0.05) to control the probability of
committing a Type II error because depends on several factors including the sample size, , and
the research hypothesis. When we do not reject H0, it may be very likely that we are committing
a Type II error (i.e., failing to reject H 0 when in fact it is false). Therefore, when tests are run and
the null hypothesis is not rejected we often make a weak concluding statement allowing for the
possibility that we might be committing a Type II error. If we do not reject H 0, we conclude that
we do not have significant evidence to show that H1 is true. We do not conclude that H0 is true