Lecture 28 30
Lecture 28 30
Overview
In this lesson, we'll continue our investigation of hypothesis testing. In this case, we'll focus our
attention on a hypothesis test for a population mean for three situations:
a hypothesis test based on the normal distribution for the mean for the completely
unrealistic situation that the population variance is known
a hypothesis test based on the -distribution for the mean for the (much more) realistic
situation that the population variance is unknown
a hypothesis test based on the -distribution for , the mean difference in the responses of
two dependent populations
Let's start by acknowledging that it is completely unrealistic to think that we'd find ourselves in the
situation of knowing the population variance, but not the population mean. Therefore, the
hypothesis testing method that we learn on this page has limited practical use. We study it only
because we'll use it later to learn about the "power" of a hypothesis test (by learning how to
calculate Type II error rates). As usual, let's start with an example.
Example 10-1
Boys of a certain age are known to have a mean weight of pounds. A complaint is made that
the boys living in a municipal children's home are underfed. As one bit of evidence, boys (of
the same age) are weighed and found to have a mean weight of = 80.94 pounds. It is known that
the population standard deviation is 11.6 pounds (the unrealistic part of this example!). Based on
the available data, what should be concluded concerning the complaint?
https://online.stat.psu.edu/stat415/book/export/html/827 1/12
7/22/23, 1:36 PM Lesson 10: Tests About One Mean
Answer
follows the standard normal distribution. It is actually a bit irrelevant here whether or not
the weights are normally distributed, because the same size is large enough for the Central
Limit Theorem to apply. In that case, we know that , as defined above, follows at least
approximately the standard normal distribution. At any rate, it seems reasonable to use the test
statistic:
For the example in hand, the value of the test statistic is:
The critical region approach tells us to reject the null hypothesis at the level if
. Therefore, we reject the null hypothesis because , and therefore
falls in the rejection region:
Z
-1.645
-1.75
As always, we draw the same conclusion by using the -value approach. Recall that the -value
approach tells us to reject the null hypothesis at the level if the -value . In
this case, the -value is :
https://online.stat.psu.edu/stat415/book/export/html/827 2/12
7/22/23, 1:36 PM Lesson 10: Tests About One Mean
0.0401
Z
-1.75
By the way, we'll learn how to ask Minitab to conduct the -test for a mean in a bit, but this is
what the Minitab output for this example looks like this:
Test of mu = 85 vs < 85
The assumed standard deviation = 11.6
95% Upper
N Mean SE Mean Bound Z P
Now that, for purely pedagogical reasons, we have the unrealistic situation (of a known population
variance) behind us, let's turn our attention to the realistic situation in which both the population
mean and population variance are unknown.
Example 10-2
https://online.stat.psu.edu/stat415/book/export/html/827 3/12
7/22/23, 1:36 PM Lesson 10: Tests About One Mean
It is assumed that the mean systolic blood pressure is = 120 mm Hg. In the Honolulu Heart Study,
a sample of people had an average systolic blood pressure of 130.1 mm Hg with a
standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic
blood pressure!) from the regular population?
Answer
The null hypothesis is , and because there is no specific direction implied, the
alternative hypothesis is . In general, we know that if the data are normally distributed,
then:
follows a -distribution with degrees of freedom. Therefore, it seems reasonable to use the
test statistic:
for testing the null hypothesis against any of the possible alternative hypotheses
, , and . For the example in hand, the value of the test
statistic is:
The critical region approach tells us to reject the null hypothesis at the level if
or if . Therefore, we reject the null hypothesis
because , and therefore falls in the rejection region:
-1.9842 1.9842
4.762
Again, as always, we draw the same conclusion by using the -value approach. The -value approach
tells us to reject the null hypothesis at the level if the -value . In this case, the
-value is :
https://online.stat.psu.edu/stat415/book/export/html/827 4/12
7/22/23, 1:36 PM Lesson 10: Tests About One Mean
4.762 4.762
Again, we'll learn how to ask Minitab to conduct the t-test for a mean in a bit, but this is what the
Minitab output for this example looks like:
By the way, the decision to reject the null hypothesis is consistent with the one you would make
using a 95% confidence interval. Using the data, a 95% confidence interval for the mean is:
which simplifies to . That is, we can be 95% confident that the mean systolic blood
pressure of the Honolulu population is between 125.89 and 134.31 mm Hg. How can a population
living in a climate with consistently sunny 80 degree days have elevated blood pressure?!
Anyway, the critical region approach for the hypothesis test tells us to reject the null
hypothesis that :
if or if
if or if
if or if
which, upon inserting the data for this particular example, is equivalent to rejecting:
https://online.stat.psu.edu/stat415/book/export/html/827 5/12
7/22/23, 1:36 PM Lesson 10: Tests About One Mean
if or if
which just happen to be (!) the endpoints of the 95% confidence interval for the mean. Indeed, the
results are consistent!
In the next lesson, we'll learn how to compare the means of two independent populations, but there
may be occasions in which we are interested in comparing the means of two dependent populations.
For example, suppose a researcher is interested in determining whether the mean IQ of the
population of first-born twins differs from the mean IQ of the population of second-born twins. She
identifies a random sample of pairs of twins, and measures , the IQ of the first-born twin, and ,
the IQ of the second-born twin. In that case, she's interested in determining whether:
or equivalently if:
Now, the population of first-born twins is not independent of the population of second-born twins.
Since all of our distributional theory requires the independence of measurements, we're rather stuck.
There's a way out though... we can "remove" the dependence between and by subtracting the
two measurements and for each pair of twins , that is, by considering the independent
measurements
Then, our null hypothesis involves just a single mean, which we'll denote , the mean of the
differences:
and then our hard work is done! We can just use the -test for a mean for conducting the hypothesis
test... it's just that, in this situation, our measurements are differences whose mean is and
standard deviation is . That is, when testing the null hypothesis against any of the
alternative hypotheses , , and , we compare the test
statistic:
https://online.stat.psu.edu/stat415/book/export/html/827 6/12
7/22/23, 1:36 PM Lesson 10: Tests About One Mean
Example 10-3
Blood samples from = 10 people were sent to each of two laboratories (Lab 1 and Lab 2) for
cholesterol determinations. The resulting data are summarized here:
. . . .
. . . .
Is there a statistically significant difference at the level, say, in the (population) mean
cholesterol levels reported by Lab 1 and Lab 2?
Answer
The null hypothesis is , and the alternative hypothesis is . The value of the
test statistic is:
The critical region approach tells us to reject the null hypothesis at the level if
or if . Therefore, we reject the null hypothesis because
, and therefore falls in the rejection region.
Again, we draw the same conclusion when using the -value approach. In this case, the -value is:
https://online.stat.psu.edu/stat415/book/export/html/827 7/12
7/22/23, 1:36 PM Lesson 10: Tests About One Mean
And, the Minitab output for this example looks like this:
Test of mu = 0 vs not = 0
N Mean StDev SE Mean 95% CI T P
10 -14.4000 6.7700 2.1409 (-19.2430, -9.5570) -6.73 0.000
1. Under the Stat menu, select Basic Statistics, and then 1-Sample Z...:
2. In the pop-up window that appears, click on the radio button labeled Summarized data. In
the box labeled Sample size, type in the sample size n, and in the box labeled Mean, type in
the sample mean. In the box labeled Standard deviation, type in the value of the known (or
rather assumed!) population standard deviation. Click on the box labeled Perform hypothesis
test, and in the box labeled Hypothesized mean, type in the value of the mean assumed in
the null hypothesis:
https://online.stat.psu.edu/stat415/book/export/html/827 8/12
7/22/23, 1:36 PM Lesson 10: Tests About One Mean
3. Click on the button labeled Options... In the pop-up window that appears, for the box
labeled Alternative, select either less than, greater than, or not equal depending on the
direction of the alternative hypothesis:
4. Then, upon clicking OK on the main pop-up window, the output should appear in the
Session window:
Test of mu = 85 vs < 85
The assumed standard deviation = 11.6
95% Upper
N Mean SE Mean Bound Z P
1. Under the Stat menu, select Basic Statistics, and then 1-Sample t...:
2. In the pop-up window that appears, click on the radio button labeled Summarized data. In
the box labeled Sample size, type in the sample size n; in the box labeled Mean, type in the
sample mean; and in the box labeled Standard deviation, type in the sample standard
deviation. Click on the box labeled Perform hypothesis test, and in the box labeled
Hypothesized mean, type in the value of the mean assumed in the null hypothesis:
3. Click on the button labeled Options... In the pop-up window that appears, for the box
labeled Alternative, select either less than, greater than, or not equal depending on the
direction of the alternative hypothesis:
https://online.stat.psu.edu/stat415/book/export/html/827 10/12
7/22/23, 1:36 PM Lesson 10: Tests About One Mean
4. Then, upon clicking OK on the main pop-up window, the output should appear in the
Session window:
(5) Note that a paired t-test can be performed in the same way. The summarized sample data
would simply be the summarized differences. The extra step of calculating the differences
would be required, however, if your data are the raw measurements from the two dependent
samples. That is, if you have two columns containing, say, Before and After measurements for
which you want to analyze Diff, their differences, you can use Minitab's calculator (under the
Calc menu, select Calculator) to calculate the differences:
5.
Upon clicking OK, the differences (Diff) should appear in your worksheet:
https://online.stat.psu.edu/stat415/book/export/html/827 11/12
7/22/23, 1:36 PM Lesson 10: Tests About One Mean
When performing the t-test, you'll then need to tell Minitab (in the Samples in columns box)
that the differences are contained in the Diff column:
Here's what the paired t-test output would look like for this example:
Test of mu = 0 vs not = 0
Variable N Mean StDev SE Mean 95% CI T P
Diff 7 2.000 1.414 0.535 (0.692, 3.308) 3.74 0.010
Legend
[1] Link
↥ Has Tooltip/Popover
Toggleable Visibility
Source: https://online.stat.psu.edu/stat415/lesson/10
Links:
https://online.stat.psu.edu/stat415/book/export/html/827 12/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
Overview
In this lesson, we'll continue our investigation of hypothesis testing. In this case, we'll focus our
attention on a hypothesis test for the difference in two population means for two
situations:
a hypothesis test based on the -distribution, known as the pooled two-sample -test, for
when the (unknown) population variances and are equal
a hypothesis test based on the -distribution, known as Welch's -test, for when the
(unknown) population variances and are not equal
Of course, because population variances are generally not known, there is no way of being 100%
sure that the population variances are equal or not equal. In order to be able to determine,
therefore, which of the two hypothesis tests we should use, we'll need to make some assumptions
about the equality of the variances based on our previous knowledge of the populations we're
studying.
Let's start with the good news, namely that we've already done the dirty theoretical work in
developing a hypothesis test for the difference in two population means when we
developed a confidence interval for the difference in two population means. Recall
that if you have two independent samples from two normal distributions with equal variances
, then:
is an unbiased estimator of the common variance . Therefore, if we're interested in testing the null
hypothesis:
(or equivalently )
https://online.stat.psu.edu/stat415/book/export/html/828 1/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
and follow the standard hypothesis testing procedures. Let's take a look at an example.
Example 11-1
A psychologist was interested in exploring whether or not male and female college students have
different driving behaviors. There were several ways that she could quantify driving behaviors. She
opted to focus on the fastest speed ever driven by an individual. Therefore, the particular statistical
question she framed was as follows:
Is the mean fastest speed driven by male college students different than the mean fastest speed
driven by female college students?
She conducted a survey of a random male college students and a random female
college students. Here is a descriptive summary of the results of her survey:
https://online.stat.psu.edu/stat415/book/export/html/828 2/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
gender
F
56 70 84 98 112 126 140
fastest
Is there sufficient evidence at the level to conclude that the mean fastest speed driven by
male college students differs from the mean fastest speed driven by female college students?
Answer
Because the observed standard deviations of the two samples are of similar magnitude, we'll assume
that the population variances are equal. Let's also assume that the two populations of fastest speed
driven for males and females are normally distributed. (We can confirm, or deny, such an assumption
using a normal probability plot, but let's simplify our analysis for now.) The randomness of the two
samples allows us to assume independence of the measurements as well.
because, among other things, the pooled sample standard deviation is:
The critical value approach tells us to reject the null hypothesis in favor of the alternative
hypothesis if:
We reject the null hypothesis because the test statistic ( ) falls in the rejection region:
https://online.stat.psu.edu/stat415/book/export/html/828 3/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
-1.9996 1.9996
3.42
There is sufficient evidence at the level to conclude that the average fastest speed driven
by the population of male college students differs from the average fastest speed driven by the
population of female college students.
Not surprisingly, the decision is the same using the -value approach. The -value is 0.0012:
By the way, we'll see how to tell Minitab to conduct a two-sample t-test in a bit here, but in the
meantime, this is what the output would look like:
Let's again start with the good news that we've already done the dirty theoretical work here. Recall
that if you have two independent samples from two normal distributions with unequal variances
, then:
https://online.stat.psu.edu/stat415/book/export/html/828 4/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
If r doesn't equal an integer, as it usually doesn't, then we take the integer portion of . That is, we
use if necessary.
With that now being recalled, if we're interested in testing the null hypothesis:
(or equivalently )
and follow the standard hypothesis testing procedures. Let's return to our fastest speed driven
example.
A psychologist was interested in exploring whether or not male and female college students have
different driving behaviors. There were a number of ways that she could quantify driving behaviors.
She opted to focus on the fastest speed ever driven by an individual. Therefore, the particular
statistical question she framed was as follows:
https://online.stat.psu.edu/stat415/book/export/html/828 5/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
Is the mean fastest speed driven by male college students different than the mean fastest speed
driven by female college students?
She conducted a survey of a random male college students and a random female
college students. Here is a descriptive summary of the results of her survey:
Is there sufficient evidence at the level to conclude that the mean fastest speed driven by
male college students differs from the mean fastest speed driven by female college students?
Answer
This time let's not assume that the population variances are equal. Then, we'll see if we arrive at a
different conclusion. Let's still assume though that the two populations of fastest speed driven for
males and females are normally distributed. And, we'll again permit the randomness of the two
samples to allow us to assume independence of the measurements as well.
Oops... that's not an integer, so we're going to need to take the greatest integer portion of that .
That is, we take the degrees of freedom to be .
Then, the critical value approach tells us to reject the null hypothesis in favor of the alternative
hypothesis if:
https://online.stat.psu.edu/stat415/book/export/html/828 6/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
We reject the null hypothesis because the test statistic ( ) falls in the rejection region:
-2.004 2.004
3.54
There is (again!) sufficient evidence at the level to conclude that the average fastest speed
driven by the population of male college students differs from the average fastest speed driven by
the population of female college students.
And again, the decision is the same using the -value approach. The -value is 0.0008:
At any rate, we see that in this case, our conclusion is the same regardless of whether or not we
assume equality of the population variances.
And, just in case you're interested... we'll see how to tell Minitab to conduct a Welch's -test very
soon, but in the meantime, this is what the output would look like for this example:
https://online.stat.psu.edu/stat415/book/export/html/828 7/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
Just as is the case for asking Minitab to calculate pooled t-intervals and Welch's t-intervals for
, the commands necessary for asking Minitab to perform a two-sample t-test or a Welch's t-
test depend on whether the data are entered in two columns, or the data are entered in one column
with a grouping variable in a second column.
Let's recall the spider and prey example, in which the feeding habits of two species of net-casting
spiders were studied. The species, the deinopis, and menneus coexist in eastern Australia. The
following data were obtained on the size, in millimeters, of the prey of random samples of the two
species:
12.9 10.2 7.4 7.0 10.5 11.9 7.1 9.9 14.4 11.3
Size of Random Pray Samples of the Menneus Spider in Millimeters
sample sample sample sample sample sample sample sample sample sample
1 2 3 4 5 6 7 8 9 10
10.2 6.9 10.9 11.0 10.1 5.3 7.5 10.3 9.2 8.8
Let's use the data and Minitab to test whether the mean prey size of the populations of the two
types of spiders differs.
2. Under the Stat menu, select Basic Statistics, and then select 2-Sample t...:
https://online.stat.psu.edu/stat415/book/export/html/828 8/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
3. In the pop-up window that appears, select Samples in different columns. Specify the name of
the First variable, and specify the name of the Second variable. For the two-sample (pooled) t-
test, click on the box labeled Assume equal variances. (For Welch's t-test, leave the box
labeled Assume equal variances unchecked.):
4. Click on the button labeled Options... In the pop-up window that appears, for the box
labeled Alternative, select either less than, greater than, or not equal depending on the
direction of the alternative hypothesis:
https://online.stat.psu.edu/stat415/book/export/html/828 9/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
5. Then, upon clicking OK on the main pop-up window, the output should appear in the
Session window:
2. Under the Stat menu, select Basic Statistics, and then select 2-Sample t...:
https://online.stat.psu.edu/stat415/book/export/html/828 10/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
3. In the pop-up window that appears, select Samples in one column. Specify the name of the
Samples variable (Prey, for us) and specify the name of the Subscripts (grouping) variable
(Group, for us). For the two-sample (pooled) t-test, click on the box labeled Assume equal
variances. (For Welch's t-test, leave the box labeled Assume equal variances unchecked.):
4. Click on the button labeled Options... In the pop-up window that appears, for the box
labeled Alternative, select either less than, greater than, or not equal depending on the
direction of the alternative hypothesis:
https://online.stat.psu.edu/stat415/book/export/html/828 11/12
7/22/23, 1:36 PM Lesson 11: Tests of the Equality of Two Means
5. Then, upon clicking OK on the main pop-up window, the output should appear in the
Session window:
Legend
[1] Link
↥ Has Tooltip/Popover
Toggleable Visibility
Source: https://online.stat.psu.edu/stat415/lesson/11
Links:
https://online.stat.psu.edu/stat415/book/export/html/828 12/12
7/22/23, 1:38 PM Lesson 12: Tests for Variances
Continuing our development of hypothesis tests for various population parameters, in this lesson, we'll
focus on hypothesis tests for population variances. Specifically, we'll develop:
a hypothesis test for testing whether a single population variance equals a particular value
a hypothesis test for testing whether two population variances are equal
Yeehah again! The theoretical work for developing a hypothesis test for a population variance is
already behind us. Recall that if you have a random sample of size n from a normal population with
(unknown) mean and variance , then:
follows a chi-square distribution with n−1 degrees of freedom. Therefore, if we're interested in testing
the null hypothesis:
and follow the standard hypothesis testing procedures. Let's take a look at an example.
Example 12-1
https://online.stat.psu.edu/stat415/book/export/html/829 1/11
7/22/23, 1:38 PM Lesson 12: Tests for Variances
A manufacturer of hard safety hats for construction workers is concerned about the mean and the
variation of the forces its helmets transmits to wearers when subjected to an external force. The
manufacturer has designed the helmets so that the mean force transmitted by the helmets to the
workers is 800 pounds (or less) with a standard deviation to be less than 40 pounds. Tests were run on a
random sample of n = 40 helmets, and the sample mean and sample standard deviation were found to
be 825 pounds and 48.5 pounds, respectively.
Do the data provide sufficient evidence, at the level, to conclude that the population standard
deviation exceeds 40 pounds?
Answer
Is the test statistic too large for the null hypothesis to be true? Well, the critical value approach would
have us finding the threshold value such that the probability of rejecting the null hypothesis if it were
true, that is, of committing a Type I error, is small... 0.05, in this case. Using Minitab (or a chi-square
probability table), we see that the cutoff value is 54.572:
54.572
That is, we reject the null hypothesis in favor of the alternative hypothesis if the test statistic is
greater than 54.572. It is. That is, the test statistic falls in the rejection region:
https://online.stat.psu.edu/stat415/book/export/html/829 2/11
7/22/23, 1:38 PM Lesson 12: Tests for Variances
54.572
57.336
Therefore, we conclude that there is sufficient evidence, at the 0.05 level, to conclude that the
population standard deviation exceeds 40.
Of course, the P-value approach yields the same conclusion. In this case, the P-value is the probablity
that we would observe a chi-square(39) random variable more extreme than 57.336:
P-value = 0.029
57.336
As the drawing illustrates, the P-value is 0.029 (as determined using the chi-square probability
calculator in Minitab). Because , we reject the null hypothesis in favor of the
alternative hypothesis.
Do the data provide sufficient evidence, at the level, to conclude that the population standard
deviation differs from 40 pounds?
Answer
https://online.stat.psu.edu/stat415/book/export/html/829 3/11
7/22/23, 1:38 PM Lesson 12: Tests for Variances
Now, is the test statistic either too large or too small for the null hypothesis to be true? Well, the critical
value approach would have us dividing the significance level into 2, to get 0.025, and putting
one of the halves in the left tail, and the other half in the other tail. Doing so (and using Minitab to get
the cutoff values), we get that the lower cutoff value is 23.654 and the upper cutoff value is 58.120:
23.654 58.120
That is, we reject the null hypothesis in favor of the two-sided alternative hypothesis if the test statistic
is either smaller than 23.654 or greater than 58.120. It is not. That is, the test statistic does not fall in
the rejection region:
23.654 58.120
57.336
Therefore, we fail to reject the null hypothesis. There is insufficient evidence, at the 0.05 level, to
conclude that the population standard deviation differs from 40.
Of course, the P-value approach again yields the same conclusion. In this case, we simply double the
P-value we obtained for the one-tailed test yielding a P-value of 0.058:
Because , we fail to reject the null hypothesis in favor of the two-sided alternative
hypothesis.
https://online.stat.psu.edu/stat415/book/export/html/829 4/11
7/22/23, 1:38 PM Lesson 12: Tests for Variances
The above example illustrates an important fact, namely, that the conclusion for the one-sided test does
not always agree with the conclusion for the two-sided test. If you have reason to believe that the
parameter will differ from the null value in a particular direction, then you should conduct the one-
sided test.
Let's now recall the theory necessary for developing a hypothesis test for testing the equality of two
population variances. Suppose is a random sample of size n from a normal population
with mean and variance . And, suppose, independent of the first sample, is
another random sample of size m from a normal population with and variance . Recall then, in
this situation, that:
have independent chi-square distributions with n−1 and m−1 degrees of freedom, respectively.
Therefore:
follows an F distribution with n−1 numerator degrees of freedom and m−1 denominator degrees of
freedom. Therefore, if we're interested in testing the null hypothesis:
(or equivalently )
and follow the standard hypothesis testing procedures. When doing so, we might also want to recall
this important fact about the F-distribution:
so that when we use the critical value approach for a two-sided alternative:
https://online.stat.psu.edu/stat415/book/export/html/829 5/11
7/22/23, 1:38 PM Lesson 12: Tests for Variances
Okay, let's take a look at an example. In the last lesson, we performed a two-sample t-test (as well as
Welch's test) to test whether the mean fastest speed driven by the population of male college students
differs from the mean fastest speed driven by the population of female college students. When we
performed the two-sample t-test, we just assumed the population variances were equal. Let's revisit
that example again to see if our assumption of equal variances is valid.
Example 12-2
A psychologist was interested in exploring whether or not male and female college students have
different driving behaviors. The particular statistical question she framed was as follows:
Is the mean fastest speed driven by male college students different than the mean fastest speed driven
by female college students?
The psychologist conducted a survey of a random male college students and a random
female college students. Here is a descriptive summary of the results of her survey:
Is there sufficient evidence at the level to conclude that the variance of the fastest speed
driven by male college students differs from the variance of the fastest speed driven by female college
students?
Answer
https://online.stat.psu.edu/stat415/book/export/html/829 6/11
7/22/23, 1:38 PM Lesson 12: Tests for Variances
(Note that I intentionally put the variance of what we're calling the Y sample in the numerator and the
variance of what we're calling the X sample in the denominator. I did this only so that my results match
the Minitab output we'll obtain on the next page. In doing so, we just need to make sure that we keep
track of the correct numerator and denominator degrees of freedom.) Using the critical value
approach, we divide the significance level into 2, to get 0.025, and put one of the halves in
the left tail, and the other half in the other tail. Doing so, we get that the lower cutoff value is 0.478 and
the upper cutoff value is 2.0441:
Because the test statistic falls in the rejection region, that is, because , we reject the
null hypothesis in favor of the alternative hypothesis. There is sufficient evidence at the level
to conclude that the population variances are not equal. Therefore, the assumption of equal variances
that we made when performing the two-sample t-test on these data in the previous lesson does not
appear to be valid. It would behoove us to use Welch's t-test instead.
In each case, we'll illustrate how to perform the hypothesis tests of this lesson using summarized data.
https://online.stat.psu.edu/stat415/book/export/html/829 7/11
7/22/23, 1:38 PM Lesson 12: Tests for Variances
2. In the pop-up window that appears, in the box labeled Data, select Sample standard deviation
(or alternatively Sample variance). In the box labeled Sample size, type in the size n of the
sample. In the box labeled Sample standard deviation, type in the sample standard deviation.
Click on the box labeled Perform hypothesis test, and in the box labeled Value, type in the
Hypothesized standard deviation (or alternatively the Hypothesized variance):
3. Click on the button labeled Options... In the pop-up window that appears, for the box labeled
Alternative, select either less than, greater than, or not equal depending on the direction of the
alternative hypothesis:
https://online.stat.psu.edu/stat415/book/export/html/829 8/11
7/22/23, 1:38 PM Lesson 12: Tests for Variances
4. Then, upon clicking OK on the main pop-up window, the output should appear in the Session
window:
https://online.stat.psu.edu/stat415/book/export/html/829 9/11
7/22/23, 1:38 PM Lesson 12: Tests for Variances
2. In the pop-up window that appears, in the box labeled Data, select Sample standard deviations
(or alternatively Sample variances). In the box labeled Sample size, type in the size n of the First
sample and m of the Second sample. In the box labeled Standard deviation, type in the sample
standard deviations for the First and Second samples:
3. Click on the button labeled Options... In the pop-up window that appears, in the box labeled
Value, type in the Hypothesized ratio of the standard deviations (or the Hypothesized ratio of
the variances). For the box labeled Alternative, select either less than, greater than, or not
equal depending on the direction of the alternative hypothesis:
1. Then, upon clicking OK on the main pop-up window, the output should appear in the
Session window:
Method
Statistics
https://online.stat.psu.edu/stat415/book/export/html/829 10/11
7/22/23, 1:38 PM Lesson 12: Tests for Variances
1 29 12.200 148.840
2 34 20.100 404.010
Tests
Test
Method DF1 DF2 P-Value
Statistic
F Test (normal) 28 33 0.37 0.009
Legend
[1] Link
↥ Has Tooltip/Popover
Toggleable Visibility
Source: https://online.stat.psu.edu/stat415/lesson/12
Links:
https://online.stat.psu.edu/stat415/book/export/html/829 11/11
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
We'll start our exploration of hypothesis tests by focusing on population proportions. Specifically,
we'll derive the methods used for testing:
Thereby allowing us to test whether two populations' proportions are equal. Along the way, we'll
learn two different approaches to hypothesis testing, one being the critical value approach and one
being the -value approach.
Every time we perform a hypothesis test, this is the basic procedure that we will follow:
Let's try to make this outlined procedure more concrete by taking a look at the following example.
Example 9-1
A four-sided (tetrahedral) die is tossed 1000 times, and 290 fours are observed. Is there evidence to
conclude that the die is biased, that is, say, that more fours than expected are observed?
https://online.stat.psu.edu/stat415/book/export/html/826 1/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Answer
As the basic hypothesis testing procedure outlines above, the first step involves stating an initial
assumption. It is:
Assume the die is unbiased. If the die is unbiased, then each side (1, 2, 3, and 4) is equally likely. So,
we'll assume that p, the probability of getting a 4 is 0.25.
In general, the initial assumption is called the null hypothesis, and is denoted . (That's a zero in
the subscript for "null"). In statistical notation, we write the initial assumption as:
That is, the initial assumption involves making a statement about a population proportion.
Now, the second step tells us that we need to collect evidence (data) for or against our initial
assumption. In this case, that's already been done for us. We were told that the die was tossed
times, and fours were observed. Using statistical notation again, we write the
collected evidence as a sample proportion:
Now we just need to complete the third step of making the decision about whether or not to reject
our initial assumption that the population proportion is 0.25. Recall that the Central Limit Theorem
tells us that the sample proportion:
follows a standard normal distribution. So, we can "translate" our observed sample
proportion of 0.290 onto the scale. Here's a picture that summarizes the situation:
https://online.stat.psu.edu/stat415/book/export/html/826 2/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
0.25 0.290
2.92
So, we are assuming that the population proportion is 0.25 (in blue), but we've observed a sample
proportion 0.290 (in red) that falls way out in the right tail of the normal distribution. It certainly
doesn't appear impossible to obtain a sample proportion of 0.29. But, that's what we're left with
deciding. That is, we have to decide if a sample proportion of 0.290 is more extreme that we'd
expect if the population proportion does indeed equal 0.25.
1. one is called the "critical value" (or "critical region" or "rejection region") approach
2. and the other is called the " -value" approach
Until we get to the page in this lesson titled The -value Approach, we'll use the critical value
approach.
Example (continued)
A four-sided (tetrahedral) die is tossed 1000 times, and 290 fours are observed. Is there evidence to
conclude that the die is biased, that is, say, that more fours than expected are observed?
Answer
Okay, so now let's think about it. We probably wouldn't reject our initial assumption that the
population proportion if our observed sample proportion were 0.255. And, we might still
not be inclined to reject our initial assumption that the population proportion if our
observed sample proportion were 0.27. On the other hand, we would almost certainly want to reject
our initial assumption that the population proportion if our observed sample proportion
were 0.35. That suggests, then, that there is some "threshold" value that once we "cross" the
threshold value, we are inclined to reject our initial assumption. That is the critical value approach in
https://online.stat.psu.edu/stat415/book/export/html/826 3/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
a nutshell. That is, critical value approach tells us to define a threshold value, called a "critical
value" so that if our "test statistic" is more extreme than the critical value, then we reject the null
hypothesis.
Let's suppose that we decide to reject the null hypothesis in favor of the "alternative
hypothesis" if:
or equivalently if
0.05
0.25 0.273
1.645
Note, by the way, that the "size" of the critical region is 0.05. This will become apparent in a bit when
we talk below about the possible errors that we can make whenever we conduct a hypothesis test.
At any rate, let's get back to deciding whether our particular sample proportion appears to be too
extreme. Well, it looks like we should reject the null hypothesis (our initial assumption )
because:
Our conclusion: we say there is sufficient evidence to conclude , that is, that the die is
biased.
By the way, this example involves what is called a one-tailed test, or more specifically, a right-tailed
test, because the critical region falls in only one of the two tails of the normal distribution, namely
the right tail.
https://online.stat.psu.edu/stat415/book/export/html/826 4/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Before we continue on the next page at looking at two more examples, let's revisit the basic
hypothesis testing procedure that we outlined above. This time, though, let's state the procedure in
terms of performing a hypothesis test for a population proportion using the critical value
approach. The basic procedure is:
1. State the null hypothesis and the alternative hypothesis . (By the way, some
textbooks, including ours, use the notation instead of to denote the alternative
hypothesis.)
2. Calculate the test statistic:
3.
Now, back to those possible errors we can make when conducting such a hypothesis test.
Possible Errors
So, argh! Every time we conduct a hypothesis test, we have a chance of making an error. (Oh dear,
why couldn't I have chosen a different profession?!)
1. If we reject the null hypothesis (in favor of the alternative hypothesis ) when the null
hypothesis is in fact true, we say we've committed a Type I error. For our example above, we
set P(Type I error) equal to 0.05:
0.25 0.273
1.645
Aha! That's why the 0.05! We wanted to minimize our chance of making a Type I error! In
general, we denote the "significance level of the test." Obviously,
we want to minimize . Therefore, typical values are 0.01, 0.05, and 0.10.
2. If we fail to reject the null hypothesis when the null hypothesis is false, we say we've
committed a Type II error. For our example, suppose (unknown to us) that the population
proportion is actually 0.27. Then, the probability of a Type II error, in this case, is:
https://online.stat.psu.edu/stat415/book/export/html/826 5/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Let's take a look at two more examples of a hypothesis test for a single proportion while recalling
the hypothesis testing procedure we outlined on the previous page:
4. Make a decision. Determine if the test statistic falls in the critical region. If it does, reject the
null hypothesis. If it does not, do not reject the null hypothesis.
The first example involves a hypothesis test for the proportion in which the alternative hypothesis is
a "greater than hypothesis," that is, the alternative hypothesis is of the form . And, the
second example involves a hypothesis test for the proportion in which the alternative hypothesis is a
"less than hypothesis," that is, the alternative hypothesis is of the form .
https://online.stat.psu.edu/stat415/book/export/html/826 6/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Example 9-2
Let p equal the proportion of drivers who use a seat belt in a state that does not have a mandatory
seat belt law. It was claimed that . An advertising campaign was conducted to increase this
proportion. Two months after the campaign, out of a random sample of drivers
were wearing seat belts. Was the campaign successful?
Answer
Because we're interested in seeing if the advertising campaign was successful, that is, that a greater
proportion of people wear seat belts, the alternative hypothesis is:
https://online.stat.psu.edu/stat415/book/export/html/826 7/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
α = 0.01
Z
2.326
That is, we reject the null hypothesis if the test statistic . Because the test statistic falls in
the critical region, that is, because , we can reject the null hypothesis in favor of
the alternative hypothesis. There is sufficient evidence at the level to conclude the
campaign was successful ( ).
Again, note that this is an example of a right-tailed hypothesis test because the action falls in the
right tail of the normal distribution.
Example 9-3
A Gallup poll released on October 13, 2000, found that 47% of the 1052 U.S. adults surveyed
classified themselves as "very happy" when given the choices of:
"very happy"
"fairly happy"
"not too happy"
Suppose that a journalist who is a pessimist took advantage of this poll to write a headline titled
"Poll finds that U.S. adults who are very happy are in the minority." Is the pessimistic journalist's
headline warranted?
Answer
https://online.stat.psu.edu/stat415/book/export/html/826 8/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Because we're interested in the majority/minority boundary line, the null hypothesis is:
Because the journalist claims that the proportion of very happy U.S. adults is a minority, that is, less
than 0.50, the alternative hypothesis is:
Now, this time, we need to put our critical region in the left tail of the normal distribution. If we use a
significance level of , then the critical region is:
α = 0.05
Z
-1.645
That is, we reject the null hypothesis if the test statistic . Because the test statistic falls in
the critical region, that is, because , we can reject the null hypothesis in favor
of the alternative hypothesis. There is sufficient evidence at the level to conclude that
, that is, U.S. adults who are very happy are in the minority. The journalist's pessimism
appears to be indeed warranted.
Note that this is an example of a left-tailed hypothesis test because the action falls in the left tail of
the normal distribution.
https://online.stat.psu.edu/stat415/book/export/html/826 9/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Example 9-4
Up until now, we have used the critical region approach in conducting our hypothesis tests. Now,
let's take a look at an example in which we use what is called the P-value approach.
Among patients with lung cancer, usually, 90% or more die within three years. As a result of new
forms of treatment, it is felt that this rate has been reduced. In a recent study of n = 150 lung cancer
patients, y = 128 died within three years. Is there sufficient evidence at the level, say, to
conclude that the death rate due to lung cancer has been reduced?
Answer
and
https://online.stat.psu.edu/stat415/book/export/html/826 10/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
α = 0.05
P
0.90
Z
-1.645 0
Since the test statistic Z = −1.92 < −1.645, we reject the null hypothesis. There is sufficient evidence
at the level to conclude that the rate has been reduced.
What if we set the significance level = P(Type I Error) to 0.01? Is there still sufficient evidence to
conclude that the death rate due to lung cancer has been reduced?
Answer
In this case, with , the rejection region is Z ≤ −2.33. That is, we reject if the test statistic falls
in the rejection region defined by Z ≤ −2.33:
https://online.stat.psu.edu/stat415/book/export/html/826 11/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
α = 0.01
P
0.90
Z
-2.33 0
Because the test statistic Z = −1.92 > −2.33, we do not reject the null hypothesis. There is insufficient
evidence at the level to conclude that the rate has been reduced.
In the first part of this example, we rejected the null hypothesis when . And, in the second
part of this example, we failed to reject the null hypothesis when . There must be some
level of , then, in which we cross the threshold from rejecting to not rejecting the null hypothesis.
What is the smallest that would still cause us to reject the null hypothesis?
Answer
We would, of course, reject any time the critical value was smaller than our test statistic −1.92:
https://online.stat.psu.edu/stat415/book/export/html/826 12/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Z
-2.33 -1.645 0
-1.92
That is, we would reject if the critical value were −1.645, −1.83, and −1.92. But, we wouldn't reject if
the critical value were −1.93. The associated with the test statistic −1.92 is called the P-
value. It is the smallest that would lead to rejection. In this case, the P-value is:
So far, all of the examples we've considered have involved a one-tailed hypothesis test in which the
alternative hypothesis involved either a less than (<) or a greater than (>) sign. What happens if we
weren't sure of the direction in which the proportion could deviate from the hypothesized null
value? That is, what if the alternative hypothesis involved a not-equal sign (≠)? Let's take a look at an
example.
What if we wanted to perform a "two-tailed" test? That is, what if we wanted to test:
versus
at the level?
https://online.stat.psu.edu/stat415/book/export/html/826 13/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Answer
Let's first consider the critical value approach. If we allow for the possibility that the sample
proportion could either prove to be too large or too small, then we need to specify a threshold
value, that is, a critical value, in each tail of the distribution. In this case, we divide the "significance
level" by 2 to get :
Z
-1.96 0 1.96
That is, our rejection rule is that we should reject the null hypothesis or we should
reject the null hypothesis . Alternatively, we can write that we should reject the null
hypothesis . Because our test statistic is −1.92, we just barely fail to reject the null
hypothesis, because 1.92 < 1.96. In this case, we would say that there is insufficient evidence at the
level to conclude that the sample proportion differs significantly from 0.90.
Now for the P-value approach. Again, needing to allow for the possibility that the sample
proportion is either too large or too small, we multiply the P-value we obtain for the one-tailed test
by 2:
0.0274 0.0274
Z
-1.92 0 1.92
Because the P-value 0.055 is (just barely) greater than the significance level , we barely fail
to reject the null hypothesis. Again, we would say that there is insufficient evidence at the
level to conclude that the sample proportion differs significantly from 0.90.
https://online.stat.psu.edu/stat415/book/export/html/826 14/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Let's close this example by formalizing the definition of a P-value, as well as summarizing the P-
value approach to conducting a hypothesis test.
P-Value
The P-value is the smallest significance level that leads us to reject the null hypothesis.
Alternatively (and the way I prefer to think of P-values), the P-value is the probability that we'd
observe a more extreme statistic than we did if the null hypothesis were true.
If the P-value is small, that is, if , then we reject the null hypothesis .
Note!
By the way, to test , some statisticians will use the test statistic:
One advantage of doing so is that the interpretation of the confidence interval — does it contain ?
— is always consistent with the hypothesis test decision, as illustrated here:
Answer
https://online.stat.psu.edu/stat415/book/export/html/826 15/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Two-tailed test. In this case, the critical region approach tells us to reject the null hypothesis
against the alternative hypothesis :
if or if
if or if
if or if
That's the same as saying that we should reject the null hypothesis is not in the
confidence interval!
Left-tailed test. In this case, the critical region approach tells us to reject the null hypothesis
against the alternative hypothesis :
if
if
if
That's the same as saying that we should reject the null hypothesis is not in the upper
confidence interval:
So far, all of our examples involved testing whether a single population proportion p equals some
value . Now, let's turn our attention for a bit towards testing whether one population proportion
equals a second population proportion . Additionally, most of our examples thus far have
involved left-tailed tests in which the alternative hypothesis involved or right-tailed tests
in which the alternative hypothesis involved . Here, let's consider an example that tests
the equality of two proportions against the alternative that they are not equal. Using statistical
notation, we'll test:
versus
https://online.stat.psu.edu/stat415/book/export/html/826 16/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Example 9-5
Time magazine reported the result of a telephone poll of 800 adult Americans. The question posed
of the Americans who were surveyed was: "Should the federal tax on cigarettes be raised to pay for
health care reform?" The results of the survey were:
Is there sufficient evidence at the , say, to conclude that the two populations — smokers
and non-smokers — differ significantly with respect to their opinions?
Answer
If = the proportion of the non-smoker population who reply "yes" and = the proportion of the
smoker population who reply "yes," then we are interested in testing the null hypothesis:
Before we can actually conduct the hypothesis test, we'll have to derive the appropriate test statistic.
Theorem
The test statistic for testing the difference in two population proportions, that is, for testing the null
hypothesis is:
https://online.stat.psu.edu/stat415/book/export/html/826 17/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
where:
Proof
Recall that:
and variance:
But, if we assume that the null hypothesis is true, then the population proportions equal some
common value p, say, that is, . In that case, then the variance becomes:
So, under the assumption that the null hypothesis is true, we have that:
follows (at least approximately) the standard normal N(0,1) distribution. Since we don't know the
(assumed) common population proportion p any more than we know the proportions and of
each population, we can estimate p using:
the proportion of "successes" in the two samples combined. And, hence, our test statistic becomes:
https://online.stat.psu.edu/stat415/book/export/html/826 18/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
as was to be proved.
Time magazine reported the result of a telephone poll of 800 adult Americans. The question posed
of the Americans who were surveyed was: "Should the federal tax on cigarettes be raised to pay for
health care reform?" The results of the survey were:
Is there sufficient evidence at the , say, to conclude that the two populations — smokers
and non-smokers — differ significantly with respect to their opinions?
Answer
versus
is:
https://online.stat.psu.edu/stat415/book/export/html/826 19/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Errr.... that Z-value is off the charts, so to speak. Let's go through the formalities anyway making the
decision first using the rejection region approach, and then using the P-value approach. Putting half
of the rejection region in each tail, we have:
Z
-1.96 0 1.96
Z
-8.99 8.99
That is, the P-value is less than 0.0001. Because , we reject the null
hypothesis. Again, there is sufficient evidence at the 0.05 level to conclude that the two populations
differ with respect to their opinions concerning imposing a federal tax to help pay for health care
reform.
Thankfully, as should always be the case, the two approaches.... the critical value approach and the
P-value approach... lead to the same conclusion
https://online.stat.psu.edu/stat415/book/export/html/826 20/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Note!
An advantage of doing so is again that the interpretation of the confidence interval — does it
contain 0? — is always consistent with the hypothesis test decision.
1. Under the Stat menu, select Basic Statistics, and then 1 Proportion...:
https://online.stat.psu.edu/stat415/book/export/html/826 21/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
2. In the pop-up window that appears, click on the radio button labeled Summarized data. In
the box labeled Number of events, type in the number of successes or events of interest, and
in the box labeled Number of trials, type in the sample size n. Click on the box labeled
Perform hypothesis test, and in the box labeled Hypothesized proportion, type in the value
of the proportion assumed in the null hypothesis:
3. Click on the button labeled Options... In the pop-up window that appears, for the box
labeled Alternative, select either less than, greater than, or not equal depending on the
direction of the alternative hypothesis. Click on the box labeled Use test and interval based
on normal distribution:
https://online.stat.psu.edu/stat415/book/export/html/826 22/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
4. Then, upon clicking OK on the main pop-up window, the output should appear in the
Session window:
As you can see, Minitab reports not only the value of the test statistic (Z = −1.91) but also the
P-value (0.028) and the 95% confidence interval (one-sided in this case, because of the one-
sided hypothesis).
1. Under the Stat menu, select Basic Statistics, and then 2 Proportions...:
https://online.stat.psu.edu/stat415/book/export/html/826 23/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
2. In the pop-up window that appears, click on the radio button labeled Summarized data. In
the boxes labeled Events, type in the number of successes or events of interest for both the
First and Second samples. And in the boxes labeled Trials, type in the size of the First
sample and the size of the Second sample:
3. Click on the button labeled Options... In the pop-up window that appears, in the box
labeled Test difference, type in the assumed value of the difference in the proportions that
appears in the null hypothesis. The default value is 0.0, the value most commonly assumed, as
it means that we are interested in testing for the equality of the population proportions. For
the box labeled Alternative, select either less than, greater than, or not equal depending on
the direction of the alternative hypothesis. Click on the box labeled Use pooled estimate of p
for test:
4. Then, upon clicking OK on the main pop-up window, the output should appear in the
Session window:
Sample X N Sample P
1 351 605 0.580165
https://online.stat.psu.edu/stat415/book/export/html/826 24/25
7/22/23, 1:38 PM Lesson 9: Tests About Proportions
Sample X N Sample P
2 41 195 0.210256
Again, as you can see, Minitab reports not only the value of the test statistic (Z = 8.99) but
other useful things as well, including the P-value, which in this case is so small as to be deemed
to be 0.000 to three digits. For scientific reporting purposes, we would typically write that as P
< 0.0001.
Legend
[1] Link
↥ Has Tooltip/Popover
Toggleable Visibility
Source: https://www.google.com/
Links:
https://online.stat.psu.edu/stat415/book/export/html/826 25/25
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
Overview
In lessons 35 and 36, we learned how to calculate point and interval estimates of the intercept and
slope parameters, and , of a simple linear regression model:
with the random errors following a normal distribution with mean 0 and variance . In this
lesson, we'll learn how to conduct a hypothesis test for testing the null hypothesis that the slope
parameter equals some value, , say. Specifically, we'll learn how to test the null hypothesis
using a -statistic.
Now, perhaps it is not a point that has been emphasized yet, but if you take a look at the form of the
simple linear regression model, you'll notice that the response 's are denoted using a capital letter,
while the predictor 's are denoted using a lowercase letter. That's because, in the simple linear
regression setting, we view the predictors as fixed values, whereas we view the responses as random
variables whose possible values depend on the population from which they came. Suppose
instead that we had a situation in which we thought of the pair as being a random sample,
, from a bivariate normal distribution with parameters , , , and . Then,
we might be interested in testing the null hypothesis , because we know that if the
correlation coefficient is 0, then and are independent random variables. For this reason, we'll
learn, not one, but three (!) possible hypothesis tests for testing the null hypothesis that the
correlation coefficient is 0. Then, because we haven't yet derived an interval estimate for the
correlation coefficient, we'll also take the time to derive an approximate confidence interval for .
Once again we've already done the bulk of the theoretical work in developing a hypothesis test for
the slope parameter of a simple linear regression model when we developed a
confidence interval for . We had shown then that:
, ,
https://online.stat.psu.edu/stat415/book/export/html/830 1/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
and follow the standard hypothesis testing procedures. Let's take a look at an example.
Example 15-1
In alligators' natural habitat, it is typically easier to observe the length of an alligator than it is the
weight. This data set contains the log weight ( ) and log length ( ) for 15 alligators captured in
central Florida. A scatter plot of the data suggests that there is a linear relationship between the
response and the predictor . Therefore, a wildlife researcher is interested in fitting the linear
model: [1] [2]
to the data. She is particularly interested in testing whether there is a relationship between the
length and weight of alligators. At the level, perform a test of the null hypothesis
against the alternative hypothesis .
Answer
The easiest way to perform the hypothesis test is to let Minitab do the work for us! Under the Stat
menu, selecting Regression, and then Regression, and specifying the response logW (for log weight)
and the predictor logL (for log length), we get:
Analysis of Variance
https://online.stat.psu.edu/stat415/book/export/html/830 2/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
Source DF SS MS F P
Regression 1 10.064 10.064 665.81 0.000
Total 14 10.260
Easy as pie! Minitab tells us that the test statistic is (in blue) with a -value (0.000) that is
less than 0.001. Because the -value is less than 0.05, we reject the null hypothesis at the 0.05 level.
There is sufficient evidence to conclude that the slope parameter does not equal 0. That is, there is
sufficient evidence, at the 0.05 level, to conclude that there is a linear relationship, among the
population of alligators, between the log length and log weight.
Of course, since we are learning this material for just the first time, perhaps we could go through the
calculation of the test statistic at least once. Letting Minitab do some of the dirtier calculations for
us, such as calculating:
as well as determining that and that the slope estimate = 3.4311, we get:
which is the test statistic that Minitab calculated... well, with just a bit of round-off error.
The hypothesis test for the slope that we developed on the previous page was developed under
the assumption that a response is a linear function of a nonrandom predictor . This situation
occurs when the researcher has complete control of the values of the variable . For example, a
researcher might be interested in modeling the linear relationship between the temperature of an
oven and the moistness of chocolate chip muffins. In this case, the researcher sets the oven
temperatures (in degrees Fahrenheit) to 350, 360, 370, and so on, and then observes the values of
the random variable , that is, the moistness of the baked muffins. In this case, the linear model:
https://online.stat.psu.edu/stat415/book/export/html/830 3/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
There are other situations, however, in which the variable is not nonrandom (yes, that's a double
negative!), but rather an observed value of a random variable . For example, a fisheries researcher
may want to relate the age of a sardine to its length . If a linear relationship could be
established, then in the future fisheries researchers could predict the age of a sardine simply by
measuring its length. In this case, the linear model:
is a linear function of the length. That is, the conditional mean of given is a linear function.
Now, in this second situation, in which both and are deemed random, we typically assume that
the pairs are a random sample from a bivariate normal
distribution with means and , variances and , and correlation coefficient . If that's the
case, it can be shown that the conditional mean:
That is:
Now, for the case where has a bivariate distribution, the researcher may not necessarily be
interested in estimating the linear function:
but rather simply knowing whether and are independent. In STAT 414, we've learned that if
follows a bivariate normal distribution, then testing for the independence of and is
equivalent to testing whether the correlation coefficient equals 0. We'll now work on developing
three different hypothesis tests for testing assuming follows a bivariate normal
distribution.
https://online.stat.psu.edu/stat415/book/export/html/830 4/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
then:
That suggests, therefore, that testing for against any of the alternative hypotheses
, and is equivalent to testing against the
corresponding alternative hypothesis , and . That is, we can
simply compare the test statistic:
to a distribution with degrees of freedom. It should be noted, though, that the test statistic
can be instead written as a function of the sample correlation coefficient:
and because of its algebraic equivalence to the first test statistic, it too follows a distribution with
degrees of freedom. Huh? How are the two test statistics algebraically equivalent? Well, if the
following two statements are true:
1.
2.
then simple algebra illustrates that the two test statistics are indeed algebraically equivalent:
https://online.stat.psu.edu/stat415/book/export/html/830 5/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
Now, for the veracity of those two statements? Well, they are indeed true. The first one requires just
some simple algebra. The second one requires a bit of trickier algebra that you'll soon be asked to
work through for homework.
but the probability distribution of is difficult to obtain. It turns out though that we can derive a
hypothesis test using just provided that we are interested in testing the more specific null
hypothesis that and are independent, that is, for testing .
Theorem
Provided that , the probability density function of the sample correlation coefficient is:
Proof
We'll use the distribution function technique, in which we first find the cumulative distribution
function , and then differentiate it to get the desired probability density function . The
cumulative distribution function is:
The first equality is just the definition of the cumulative distribution function, while the second and
third equalities come from the definition of the statistic as a function of the sample correlation
coefficient . Now, using what we know of the p.d.f. of a random variable with
degrees of freedom, we get:
Now, it's just a matter of taking the derivative of the c.d.f. to get the p.d.f. ). Using the
Fundamental Theorem of Calculus, in conjunction with the chain rule, we get:
https://online.stat.psu.edu/stat415/book/export/html/830 6/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
Focusing first on the derivative part of that equation, using the quotient rule, we get:
Simplifying, we get:
Now, looking back at , let's work on the part. Replacing the function in the one place where
a t appears in the p.d.f. of a random variable with degrees of freedom, we get:
Now, because:
we finally get:
We're almost there! We just need to multiply the two parts together. Doing so, we get:
https://online.stat.psu.edu/stat415/book/export/html/830 7/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
Now that we know the p.d.f. of , testing against any of the possible alternative
hypotheses just involves integrating to find the critical value(s) to ensure that , the probability
of a Type I error is small. For example, to test against the alternative , we find
the value such that:
Yikes! Do you have any interest in integrating that function? Well, me neither! That's why we'll
instead use an Table, such as the one we have in Table IX at the back of our textbook.
Theorem
The statistic:
The theorem, therefore, allows us to test the general null hypothesis against any of the
possible alternative hypotheses comparing the test statistic:
What? We've looked at no examples yet on this page? Let's take care of that by closing with an
example that utilizes each of the three hypothesis tests we derived above.
https://online.stat.psu.edu/stat415/book/export/html/830 8/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
Example 15-2
An admissions counselor at a large public university was interested in learning whether freshmen
calculus grades are independent of high school math achievement test scores. The sample
correlation coefficient between the mathematics achievement test scores and calculus grades for a
random sample of college freshmen was deemed to be 0.84.
Does this observed sample correlation coefficient suggest, at the level, that the population
of freshmen calculus grades are independent of the population of high school math achievement
test scores?
Answer
against
We reject the null hypothesis if the test statistic is greater than 2.306 or less than −2.306.
-2.306 2.306
https://online.stat.psu.edu/stat415/book/export/html/830 9/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
Because , we reject the null hypothesis in favor of the alternative hypothesis. There
is sufficient evidence at the 0.05 level to conclude that the population of freshmen calculus grades
are not independent of the population of high school math achievement test scores.
Using the R-statistic, with 8 degrees of freedom, Table IX in the back of the book tells us to reject
the null hypothesis if the absolute value of is greater than 0.6319. Because our observed
, we again reject the null hypothesis in favor of the alternative hypothesis. There
is sufficient evidence at the 0.05 level to conclude that freshmen calculus grades are not
independent of high school math achievement test scores.
In this case, we reject the null hypothesis if the absolute value of were greater than 1.96. It clearly
is, and so we again reject the null hypothesis in favor of the alternative hypothesis. There is sufficient
evidence at the 0.05 level to conclude that freshmen calculus grades are not independent of high
school math achievement test scores.
Theorem
An approximate confidence interval for is where:
and
Proof
We previously learned that:
https://online.stat.psu.edu/stat415/book/export/html/830 10/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
follows at least approximately a standard normal distribution. So, we can do our usual trick
of starting with a probability statement:
to get ..... can you fill in the details?! ..... the formula for a confidence interval for :
where:
and
as was to be proved!
An admissions counselor at a large public university was interested in learning whether freshmen
calculus grades are independent of high school math achievement test scores. The sample
correlation coefficient between the mathematics achievement test scores and calculus grades for a
random sample of college freshmen was deemed to be 0.84.
Answer
Because we are interested in a 95% confidence interval, we use . Therefore, the lower
limit of an approximate 95% confidence interval for is:
https://online.stat.psu.edu/stat415/book/export/html/830 11/12
7/22/23, 1:42 PM Lesson 15: Tests Concerning Regression and Correlation
and the upper limit of an approximate 95% confidence interval for is:
We can be (approximately) 95% confident that the correlation between the population of high
school mathematics achievement test scores and freshmen calculus grades is between 0.447 and
0.961. (Not a particularly useful interval, I might say! It might behoove the admissions counselor to
collect data on a larger sample, so that he or she can obtain a narrower confidence interval.)
Legend
[1] Link
↥ Has Tooltip/Popover
Toggleable Visibility
Source: https://online.stat.psu.edu/stat415/lesson/15
Links:
1. https://online.stat.psu.edu/stat415/sites/stat415/files//lesson43/Lesson43_Minitab01.gif
2. https://online.stat.psu.edu/stat415/sites/stat415/files//lesson43/Lesson43_Minitab02.gif
https://online.stat.psu.edu/stat415/book/export/html/830 12/12
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
Overview
Let's start by recapping what we have discussed thus far in the course and mention what remains:
1. The fundamentals of the sampling distributions for the sample mean and the sample
proportion.
2. We illustrated how these sampling distributions form the basis for estimation (confidence
intervals) and testing for one mean or one proportion.
3. Then we extended the discussion to analyzing situations for two variables; one a response and
the other an explanatory. When both variables were categorical we compared two proportions;
when the explanatory was categorical, and the response was quantitative, we compared two
means.
4. Next, we will take a look at other methods and discuss how they apply to situations where:
both variables are categorical with at least one variable with more than two levels (Chi-
Square Test of Independence)
both variables are quantitative (Linear Regression)
the explanatory variable is categorical with more than two levels, and the response is
quantitative (Analysis of Variance or ANOVA)
In this Lesson, we will examine relationships where both variables are categorical using the Chi-
Square Test of Independence. We will illustrate the connection between the Chi-Square test for
independence and the z-test for two independent proportions in the case where each variable has
only two levels.
Going forward, keep in mind that this Chi-Square test, when significant, only provides statistical
evidence of an association or relationship between the two categorical variables. Do NOT confuse
this result with a correlation which refers to a linear relationship between two quantitative variables
(more on this in the next lesson).
The primary method for displaying the summarization of categorical variables is called a
contingency table. When we have two measurements on our subjects that are both categorical, the
contingency table is sometimes referred to as a two-way table.
This terminology is derived because the summarized table consists of rows and columns (i.e., the
data display goes two ways).
The size of a contingency table is defined by the number of rows times the number of columns
associated with the levels of the two categorical variables. The size is notated , where is the
number of rows of the table and is the number of columns. A cell displays the count for the
intersection of a row and column. Thus the size of a contingency table also gives the number of cells
for that table. For example, if we have a table, then we have cells.
Note! As we will see, these contingency tables usually include a 'total' row and a 'total' column
which represent the marginal totals, i.e., the total count in each row and the total count in each
column. This total row and total column are NOT included in the size of the table. The size refers to
https://online.stat.psu.edu/stat500/book/export/html/477 1/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
Application
A random sample of 500 U.S. adults is questioned regarding their political affiliation and opinion on
a tax reform bill. The results of this survey are summarized in the following contingency table:
Republican 64 67 84 215
The size of this table is $2\times 3$ and NOT $3\times 4$. There are only two rows of observed data
for Party Affiliation and three columns of observed data for their Opinion. We define the Party
Affiliation as the explanatory variable and Opinion as the response because it is more natural to
analyze how one's opinion is shaped by their party affiliation than the other way around.
From here, we would want to determine if an association (relationship) exists between Political Party
Affiliation and Opinion on Tax Reform Bill. That is, are the two variables dependent. We'll discuss in
the next section how to approach this.
Objectives
Upon successful completion of this lesson, you should be able to:
https://online.stat.psu.edu/stat500/book/export/html/477 2/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
How do we test the independence of two categorical variables? It will be done using the Chi-Square
Test of Independence.
As with all prior statistical tests we need to define null and alternative hypotheses. Also, as we have
learned, the null hypothesis is what is assumed to be true until we have evidence to go against it. In
this lesson, we are interested in researching if two categorical variables are related or associated (i.e.,
dependent). Therefore, until we have evidence to suggest that they are, we must assume that they
are not. This is the motivation behind the hypothesis for the Chi-Square Test of Independence:
Note! There are several ways to phrase these hypotheses. Instead of using the words "independent"
and "dependent" one could say "there is no relationship between the two categorical variables"
versus "there is a relationship between the two categorical variables." Or "there is no association
between the two categorical variables" versus "there is an association between the two variables."
The important part is that the null hypothesis refers to the two categorical variables not being
related while the alternative is trying to show that they are related.
Once we have gathered our data, we summarize the data in the two-way contingency table. This
table represents the observed counts and is called the Observed Counts Table or simply the
Observed Table. The contingency table on the introduction page to this lesson represented the
observed counts of the party affiliation and opinion for those surveyed.
The question becomes, "How would this table look if the two variables were not related?" That is,
under the null hypothesis that the two variables are independent, what would we expect our data to
look like?
Group 2 C D C+D
Total A+C B+D A+B+C+D
The total count is . Let's focus on one cell, say Group 1 and Success with observed
count A. If we go back to our probability lesson, let denote the event 'Group 1' and denote the
event 'Success.' Then,
and .
Recall that if two events are independent, then their intersection is the product of their respective
probabilities. In other words, if and are independent, then...
https://online.stat.psu.edu/stat500/book/export/html/477 3/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
If we considered counts instead of probabilities, then we get the count by multiplying the probability
by the total count. In other words...
This is the count we would expect to see if the two variables were independent (i.e. assuming the
null hypothesis is true).
The expected count for each cell under the null hypothesis is:
Observed Table:
republican 64 67 84 215
total 202 150 148 500
Answer
We need to find what is called the Expected Counts Table or simply the Expected Table. This table
displays what the counts would be for our sample data if there were no association between the
variables.
https://online.stat.psu.edu/stat500/book/export/html/477 4/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
democrat 285
republican 215
The statistical question becomes, "Are the observed counts so different from the expected counts
that we can conclude a relationship exists between the two variables?" To conduct this test we
compute a Chi-Square test statistic where we compare each cell's observed count to its respective
expected count.
In a summary table, we have cells. Let denote the observed counts for
each cell and denote the respective expected counts for each cell.
Under the null hypothesis and certain conditions (discussed below), the test statistic follows a Chi-
Square distribution with degrees of freedom equal to , where is the number of rows
and is the number of columns. We leave out the mathematical details to show why this test statistic
is used and why it follows a Chi-Square distribution.
As we have done with other statistical tests, we make our decision by either comparing the value of
the test statistic to a critical value (rejection region approach) or by finding the probability of getting
this test statistic value or one more extreme (p-value approach).
The critical value for our Chi-Square test is with degree of freedom = , while the p-
value is found by with degrees of freedom = .
https://online.stat.psu.edu/stat500/book/export/html/477 5/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
Answer
By Hand [1]
Using Minitab [2]
1. The contingency table (political_affiliation.csv) is given below. Each cell contains the observed
count and the expected count in parentheses. For example, there were 138 democrats who
favored the tax bill. The expected count under the null hypothesis is 115.14. Therefore, the cell is
displayed as 138 (115.14). [3]
Note! We do not expect you to calculate the critical value or the p-value by hand. The p-value
can be found using software.
2. Let's apply the Chi-Square Test of Independence to our example where we have a random
sample of 500 U.S. adults who are questioned regarding their political affiliation and opinion on
a tax reform bill. Test if the political affiliation and their opinion on a tax reform bill are
dependent at a 5% level of significance.
https://online.stat.psu.edu/stat500/book/export/html/477 6/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
the raw data your data will need to consist of two columns: one with the explanatory
variable data (goes in the 'row' field) and the response variable data (goes in the 'column'
field).
3. Labeling (Optional) When using the summarized data you can label the rows and columns
if you have the variable labels in columns of the worksheet. For example, if we have a
column with the two political party affiliations and a column with the three opinion choices
we could use these columns to label the output.
4. Click the Statistics tab. Keep checked the four boxes already checked, but also check
the box for 'Each cell's contribution to the chi-square.' Click OK.
5. Click OK.
Note! If you have the observed counts in a table, you can copy/paste them into Minitab. For
instance, you can copy the entire observed counts table (excluding the totals!) for our example
and paste these into Minitab starting with the first empty cell of a column.
138 83 64
64 67 84
Pearson Chi-Sq = 4.5386 + 0.073 + 4.914 + 6.016 + 0.097 + 6.5137 = 22.152 DF = 2, P-Value =
0.000
(Ignore the Fisher's p-value! The p-value highlighted above is calculated using the methods
we learned in this lesson. More specifically, the chi-square we learned is referred to as the
Pearson Chi-Square. The Fisher's test uses a different method than what we explained in
https://online.stat.psu.edu/stat500/book/export/html/477 7/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
this lesson to calculate a test statistic and p-value. This method incorporates a log of the
ratio of observed to expected values. It's a different technique that is more complicated to
do by-hand. Minitab automatically includes both results in its output.)
The Chi-Square test statistic is 22.152 and calculated by summing all the individual cell's Chi-
Square contributions:
Minitab calculates this p-value to be less than 0.001 and reports it as 0.000. Given this p-value of
0.000 is less than the alpha of 0.05, we reject the null hypothesis that political affiliation and their
opinion on a tax reform bill are independent. We conclude that there is evidence that the two
variables are dependent (i.e., that there is an association between the two variables).
Shift 2 67 85 1 153
Shift 3 37 72 3 112
Answer
Minitab output:
https://online.stat.psu.edu/stat500/book/export/html/477 8/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
Chi-Square Test
C1 C2 C3 Total
106 124 1
1 231
97.80 130.87 2.33
67 85 1
2 153
64.78 86.68 1.54
37 72 3
3 112
47.42 63.45 1.13
Note that there are 3 cells with expected counts less than 5.0.
In the above example, we don't have a significant result at a 5% significance level since the p-value
(0.071) is greater than 0.05. Even if we did have a significant result, we still could not trust the result,
because there are 3 (33.3% of) cells with expected counts < 5.0
Caution!
Sometimes researchers will categorize quantitative data (e.g., take height measurements and
categorize as 'below average,' 'average,' and 'above average.') Doing so results in a loss of
information - one cannot do the reverse of taking the categories and reproducing the raw
quantitative measurements. Instead of categorizing, the data should be analyzed using quantitative
methods.
Try it!
A food services manager for a baseball park wants to know if there is a relationship between gender
(male or female) and the preferred condiment on a hot dog. The following table summarizes the
results. Test the hypothesis with a significance level of 10%.
https://online.stat.psu.edu/stat500/book/export/html/477 9/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
Condiment
Answer
Condiment
None of the expected counts in the table are less than 5. Therefore, we can proceed with the Chi-
Square test.
With a p-value greater than 10%, we can conclude that there is not enough evidence in the data to
suggest that gender and preferred condiment are related.
Say we have a study of two categorical variables each with only two levels. One of the response
levels is considered the "success" response and the other the "failure" response. A general 2 × 2
table of the observed counts would be as follows:
https://online.stat.psu.edu/stat500/book/export/html/477 10/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
Group 1 A B A+B
Group 2 C D C+D
Group 1 A+B
Group 2 C+D
Recall from our Z-test of two proportions that our null hypothesis is that the two population
proportions, and , were assumed equal while the two-sided alternative hypothesis was that
they were not equal.
This null hypothesis would be analogous to the two groups being independent.
Also, if the two success proportions are equal, then the two failure proportions would also be equal.
Note as well that with our Z-test the conditions were that the number of successes and failures for
each group was at least 5. That equates to the Chi-square conditions that all expected cells in a 2 × 2
table be at least 5. (Remember at least 80% of all cells need an expected count of at least 5. With
80% of 4 equal to 3.2 this means all four cells must satisfy the condition).
When we run a Chi-square test of independence on a 2 × 2 table, the resulting Chi-square test
statistic would be equal to the square of the Z-test statistic (i.e., ) from the Z-test of two
independent proportions.
Application
Consider the following example where we form a 2 × 2 for the Political Party and Opinion by only
considering the Favor and Opposed responses:
https://online.stat.psu.edu/stat500/book/export/html/477 11/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
republican 64 84 148
The Chi-square test produces a test statistic of 22.00 with a p-value of 0.00
Try it!
The condiments and gender data were condensed to consider gender and either mustard or
ketchup. The manager wants to know if the proportion of males that prefer ketchup is the same as
the proportion of females that prefer ketchup. Test the hypothesis two ways (1) using the Chi-square
test and (2) using the z-test for independence with a significance level of 10%. Show how the two
test statistics are related and compare the p-values.
Condiment
Total 40 42 82
Answer
Let males be denoted as sample one and females as sample two. Using the table, we have:
and
and
The conditions are satisfied for this test (verify for extra practice).
https://online.stat.psu.edu/stat500/book/export/html/477 12/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
The p-value is .
The p-value is greater than our significance level. Therefore, there is not enough evidence in the
data to suggest that the proportion of males that prefer ketchup is different than the proportion of
females that prefer ketchup.
Condiment
Ketchup Mustard Total
Male 15 (18.537) 23 (19.463) 38
Gender
Female 25 (21.463) 19 (22.537) 44
Total 40 42 82
There are no expected counts less than 5. The test statistic is:
With 1 degree of freedom, the p-value is 0.1168. The p-value is greater than our significance value.
Therefore, there is not enough evidence to suggest that gender and condiments (ketchup or
mustard) are related.
Comparison
The p-values would be the same without rounding errors (0.1172 vs 0.1168). The z-statistic is
-1.567. The square of this value is 2.455 which is what we have (rounded) for the chi-square
statistic. The conclusions are the same.
https://online.stat.psu.edu/stat500/book/export/html/477 13/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
Risk
In this section, we will introduce some other measures we can find using a contingency table. One of
the most straightforward measures to find is the risk of any given event.
Risk
The probability that an event will occur.
In simple terms, a risk for a group is the same as the proportion of "success" for a particular group.
Relative Risk
Have you ever heard a doctor tell you or a family member something similar to the following: "If you
do not lose weight or get your cholesterol under control you are about five times more likely to
suffer a heart attack than if you had these numbers in the normal range." If so, how alarmed should
one be? "Five times" sounds alarming!
First off, this "five times" represents what is called relative risk.
Relative risk
Relative risk is a ratio of the risks of two groups.
In the example described above, it would be the risk of heart attack for a person in their current
condition compared to the risk of heart attack if that person were in the normal ranges. However, to
truly interpret the severity of a relative risk we have to know the baseline risk.
Baseline Risk
The baseline risk is the denominator of relative risk, i.e., the risk of the group being compared
to.
In our example, this would be the risk of heart attack for the normal range. If this baseline risk is
high, then a relative risk of 5 would be alarming; if the baseline risk is small, then a relative risk of 5
may not be too serious.
For instance, if the risk of a heart attack for someone in the normal range was 1 out of 10, then the
risk of a heart attack for a person with the above average numbers would be five times this or 5 out
of 10. That is, the person would have roughly a 50/50 chance of suffering a heart attack if they didn't
get their weight and cholesterol in check. However, if the risk of a heart attack for the normal range
group was 1 out of 500, then the risk of a heart attack for a person with above average numbers
would be 5 out of 500 or 0.01. The person would have about a 1% chance of a heart attack if they
didn't improve their health. In both cases the relative risk was 5, but with entirely different levels of
impact. Please note this example is not meant to be interpreted that taking care of your health is not
important!!!
https://online.stat.psu.edu/stat500/book/export/html/477 14/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
Odds
Odds is a ratio of the number of “success” over the number of “failures.” It can be reported as a
fraction or as “number of success: number of failures.”
The relative risk that democrats favor the bill compared to republicans:
We would interpret this relative risk as "Democrats are about 1.6 times more likely than Republicans
to favor the bill (i.e.: Democrats are 60% more likely to support the bill than Republicans)."
Try it!
Consider again our previous example comparing gender and preferred condiments. The summary
table is shown below for convenience.
Condiment
Ketchup Mustard Total
Male 15 23 38
Gender
Female 25 19 44
Total 40 42 82
Find the risk of either gender preferring ketchup and use those risks to find and interpret the relative
risk.
https://online.stat.psu.edu/stat500/book/export/html/477 15/16
8/13/23, 9:02 PM Lesson 8: Chi-Square Test for Independence
Answer
The relative risk that females prefer ketchup compared to males is:
“Females are about 1.435 times more likely to prefer ketchup on hot dogs than males.”
In this Lesson, we learned how to calculate counts under the assumption that the two categorical
variables are independent. We then used these expected counts to test the hypotheses:
We demonstrated how this test relates to our test for two proportions when the alternative is two-
sided.
We also introduced the terms risk and relative risk. The calculation, as well as the interpretation, is
discussed.
In the next Lesson, we will consider the case where there are two quantitative variables (quantitative
response and quantitative explanatory variable). We will explore how to determine if the variables
have a significant linear relationship.
Legend
[1] Link
↥ Has Tooltip/Popover
Toggleable Visibility
Source: https://online.stat.psu.edu/stat500/lesson/8
Links:
1. https://online.stat.psu.edu/stat500#tablist-cke_4-tab-pane-1
2. https://online.stat.psu.edu/stat500#tablist-cke_4-tab-pane-2
3. https://online.stat.psu.edu/stat500/sites/stat500/files/data/political_affiliation.csv
4. https://online.stat.psu.edu/stat500/sites/stat500/files/data/shift_quality.txt
https://online.stat.psu.edu/stat500/book/export/html/477 16/16