T Test
T Test
A t-test is used to determine if there is a significant difference between the means of two groups and how they
are related.
There are two popular types of t tests:
• A Paired t-test.
• An unpaired t-test.
These statistical tests are commonly used in research in the fields of biology, business, and psychology.
A paired t-test is designed to compare the means of the same group or item under two separate scenarios.
An unpaired t-test compares the means of two independent or unrelated groups. In an unpaired t-test, the
variance between groups is assumed to be equal.
• Paired t-test:
A paired t-test (also known as a dependent or correlated t-test) is a statistical test that compares the
averages/means and standard deviations of two related groups to determine if there is a significant
difference between the two groups.
a. A significant difference occurs when the differences between groups are unlikely to be due to
sampling error or chance.
b. The groups can be related by being the same group of people, the same item, or being subjected
to the same conditions.
c. Paired t-tests are considered more powerful than unpaired t-tests because using the same
participants or item eliminates variation between the samples that could be caused by anything
other than what’s being tested.
Marks Scheme:
Bivariate, Hypothesis testing [272 marks]
1a. [1 mark]
As part of a study into healthy lifestyles, Jing visited Surrey Hills University. Jing recorded a person’s position
in the university and how frequently they ate a salad. Results are shown in the table.
At the 10 % level of significance, a 𝑡-test was used to compare the means of the two groups. The data is
assumed to be normally distributed and the standard deviations are equal between the two groups.
State the null hypothesis.
2b. [1 mark]
State the alternative hypothesis.
2c. [2 marks]
Calculate the 𝑝-value for this test.
2d. [2 marks]
State, giving a reason, whether Ms Calhoun should accept the null hypothesis.
3a. [2 marks]
The Malvern Aquatic Center hosted a 3 metre spring board diving event. The judges, Stan and Minsun awarded
8 competitors a score out of 10. The raw data is collated in the following table.
3g. [2 marks]
Find the value of the Spearman’s rank correlation coefficient, 𝑟* .
3h. [2 marks]
Comment on the result obtained for 𝑟* .
3i. [1 mark]
The Commissioner believes Minsun’s score for competitor G is too high and so decreases the score from 9.5 to
9.1.
Explain why the value of the Spearman’s rank correlation coefficient 𝑟* does not change.
4a. [1 mark]
Two IB schools, A and B, follow the IB Diploma Programme but have different teaching methods. A research
group tested whether the different teaching methods lead to a similar final result.
For the test, a group of eight students were randomly selected from each school. Both samples were given a
standardized test at the start of the course and a prediction for total IB points was made based on that test; this
was then compared to their points total at the end of the course.
Previous results indicate that both the predictions from the standardized tests and the final IB points can be
modelled by a normal distribution.
It can be assumed that:
• the standardized test is a valid method for predicting the final IB points
• that variations from the prediction can be explained through the circumstances of the student or school.
Identify a test that might have been used to verify the null hypothesis that the predictions from the standardized
test can be modelled by a normal distribution.
4b. [1 mark]
State why comparing only the final IB points of the students from the two schools would not be a valid test for
the effectiveness of the two different teaching methods.
4c. [1 mark]
The data for school A is shown in the following table.
For each student, the change from the predicted points to the final points (𝑓 − 𝑝) was calculated.
Find the mean change.
4d. [2 marks]
Find the standard deviation of the changes.
4e. [4 marks]
Use a paired 𝑡-test to determine whether there is significant evidence that the students in school A have
improved their IB points since the start of the course.
4f. [5 marks]
The data for school B is shown in the following table.
Use an appropriate test to determine whether there is evidence, at the 5 % significance level, that the students in
school B have improved more than those in school A.
4g. [1 mark]
State why it was important to test that both sets of points were normally distributed.
4h. [3 marks]
School A also gives each student a score for effort in each subject. This effort score is based on a scale of 1 to 5
where 5 is regarded as outstanding effort.
It is claimed that the effort put in by a student is an important factor in improving upon their predicted IB
points.
Perform a test on the data from school A to show it is reasonable to assume a linear relationship between effort
scores and improvements in IB points. You may assume effort scores follow a normal distribution.
4i. [1 mark]
Hence, find the expected improvement between predicted and final points for an increase of one unit in effort
grades, giving your answer to one decimal place.
4j. [6 marks]
A mathematics teacher in school A claims that the comparison between the two schools is not valid because the
sample for school B contained mainly girls and that for school A, mainly boys. She believes that girls are likely
to show a greater improvement from their predicted points to their final points.
She collects more data from other schools, asking them to class their results into four categories as shown in the
following table.
The owner of the orchard wants to know whether the mean weight of the apples from tree A(𝜇8 ) is greater than
the mean weight of the apples from tree B(𝜇9 ) so sets up the following test:
H; : 𝜇8 = 𝜇9 and H$ : 𝜇8 > 𝜇9
Find the 𝑝-value for the owner’s test.
5c. [2 marks]
The test is performed at the 5% significance level.
State the conclusion of the test, giving a reason for your answer.
6a. [2 marks]
The water temperature (𝑇) in Lake Windermere is measured on the first day of eight consecutive months (𝑚)
from January to August (months 1 to 8) and the results are shown below. The value for May (month 5) has
been accidently deleted.
Assuming the data follows a linear model for this period, find the regression line of 𝑇 on 𝑚 for the remaining
data.
6b. [2 marks]
Use your line to find an estimate for the water temperature on the first day of May.
6c. [1 mark]
Explain why your line should not be used to estimate the value of 𝑚 at which the temperature is 10.0 𝐶.
6d. [1 mark]
Explain in context why your line should not be used to predict the value for December (month 12).
6e. [1 mark]
State a more appropriate model for the water temperature in the lake over an extended period of time. You are
not expected to calculate any parameters.
7a. [2 marks]
It is believed that the power 𝑃 of a signal at a point 𝑑 km from an antenna is inversely proportional to 𝑑 G where
𝑛 ∈ ℤK .
The value of 𝑃 is recorded at distances of 1 𝑚 to 5 𝑚 and the values of log$; 𝑑 and log$; 𝑃 are plotted on the
graph below.
Find the equation of the least squares regression line of log$; 𝑃 against log$; 𝑑.
7c. [1 mark]
Use your answer to part (b) to write down the value of 𝑛 to the nearest integer.
7d. [2 marks]
Find an expression for 𝑃 in terms of 𝑑.
8. [2 marks]
A teacher is concerned about the amount of lesson time lost by 8 students through arriving late at school. Over
a period of 2 weeks he records the total number of minutes they are late. He also asks them how far they live
from school. The results are shown in the table below.
Which of the correlation coefficients would you recommend is used to assess whether or not there is an
association between total number of minutes late and distance from school? Fully justify your answer.
9a. [3 marks]
A dice manufacturer claims that for a novelty die he produces the probability of scoring the numbers 1 to 5 are
all equal, and the probability of a 6 is two times the probability of scoring any of the other numbers.
Find the probability of scoring a six when rolling the novelty die.
9b. [4 marks]
Find the probability of scoring more than 2 sixes when this die is rolled 5 times.
9c. [2 marks]
To test the manufacture’s claim one of the novelty dice is rolled 350 times and the numbers scored on the die
are shown in the table below.
Find the expected frequency for each of the numbers if the manufacturer’s claim is true.
9d. [2 marks]
A 𝜒 % goodness of fit test is to be used with a 5% significance level.
Write down the null and alternative hypotheses.
9e. [1 mark]
State the degrees of freedom for the test.
9f. [4 marks]
Determine the conclusion of the test, clearly justifying your answer.
10a. [2 marks]
Dana has collected some data regarding the heights ℎ (metres) of waves against a pier at 50 randomly chosen
times in a single day. This data is shown in the table below.
She wishes to perform a 𝜒 % -test at the 5% significance level to see if the height of waves could be modelled by
a normal distribution. Her null hypothesis is
H; : The data can be modelled by a normal distribution.
From the table she calculates the mean of the heights in her sample to be 0.828 m and the standard deviation of
the heights 𝑠G to be 0.257 m.
Use the given value of 𝑠G to find the value of 𝑠GT$ .
10b. [3 marks]
She calculates the expected values for each interval under this null hypothesis, and some of these values are
shown in the table below.
Find the value of 𝑎 and the value of 𝑏, giving your answers correct to one decimal place.
10c. [2 marks]
%
Find the value of the 𝜒 % test statistic (𝜒WXYW ) for this test.
10d. [2 marks]
Determine the degrees of freedom for Dana’s test.
10e. [2 marks]
It is given that the critical value for this test is 9.49.
State the conclusion of the test in context. Use your answer to part (c) to justify your conclusion.
11a. [7 marks]
Give your answers to four significant figures.
A die is thrown 120 times with the following results.
It is claimed that all digits have the same probability of appearing in the sequence.
Test this claim at the 5% level of significance.
12b. [2 marks]
Explain what is meant by the 5% level of significance.
13a. [5 marks]
Kayla wants to measure the extent to which two judges in a gymnastics competition are in agreement. Each
judge has ranked the seven competitors, as shown in the table, where 1 is the highest ranking and 7 is the
lowest.
Calculate Spearman’s rank correlation coefficient for this data.
13b. [1 mark]
State what conclusion Kayla can make from the answer in part (a).
14a. [1 mark]
Charles wants to measure the strength of the relationship between the price of a house and its distance from the
city centre where he lives. He chooses houses of a similar size and plots a graph of price, 𝑃 (in thousands of
dollars) against distance from the city centre, 𝑑 (km).
Explain why it is not appropriate to use Pearson’s product moment correlation coefficient to measure the
strength of the relationship between 𝑃 and 𝑑.
14b. [1 mark]
Explain why it is appropriate to use Spearman’s rank correlation coefficient to measure the strength of the
relationship between 𝑃 and 𝑑.
14c. [6 marks]
The data from the graph is shown in the table.
Anita decides to use a t-test, at the 5% significance level, to determine if the mean weight of the fish changed
after construction of the factory.
State an assumption that Anita is making, in order to use a t-test.
15b. [1 mark]
State the hypotheses for this t-test.
15c. [3 marks]
Find the p-value for this t-test.
15d. [2 marks]
State the conclusion of this test, in context, giving a reason.
16a. [5 marks]
In an effort to study the level of intelligence of students entering college, a psychologist collected data from
4000 students who were given a standard test. The predictive norms for this particular test were computed from
a very large population of scores having a normal distribution with mean 100 and standard deviation of 10. The
psychologist wishes to determine whether the 4000 test scores he obtained also came from a normal distribution
with mean 100 and standard deviation 10. He prepared the following table (expected frequencies are rounded to
the nearest integer):
Copy and complete the table, showing how you arrived at your answers.
16b. [6 marks]
Test the hypothesis at the 5% level of significance.
17. [9 marks]
Six coins are tossed simultaneously 320 times, with the following results.
At the 5% level of significance, test the hypothesis that all the coins are fair.
18a. [2 marks]
Adesh wants to model the cooling of a metal rod. He heats the rod and
records its temperature as it cools.
He believes the temperature can be modeled by 𝑇(𝑡) = 𝑎e\] + 25, where 𝑎, 𝑏 ∈ ℝ.
Show that ln(𝑇 − 25) = 𝑏𝑡 + ln 𝑎.
18b. [3 marks]
Find the equation of the regression line of ln(𝑇 − 25) on 𝑡.
18c. [3 marks]
Hence
find the value of 𝑎 and of 𝑏.
18d. [2 marks]
predict the temperature of the metal rod after 3 minutes.
19a. [1 mark]
Eggs at a farm are sold in boxes of six. Each egg is either brown or white. The owner believes that the number
of brown eggs in a box can be modelled by a binomial distribution. He examines 100 boxes and obtains the
following data.