0% found this document useful (0 votes)
631 views

T Test

Here are the steps to solve this practice problem: 1a. The null hypothesis is that position in the university and frequency of eating salad are independent. 1b. The p-value is 0.0357. 1c. The null hypothesis should be rejected because the p-value is less than 0.05. 2a. The null hypothesis is that the mean height of male students (μm) is equal to the mean height of female students (μf). 2b. The alternative hypothesis is that the mean height of male students is not equal to the mean height of female students. 2c. The p-value is 0.0243. 2d.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
631 views

T Test

Here are the steps to solve this practice problem: 1a. The null hypothesis is that position in the university and frequency of eating salad are independent. 1b. The p-value is 0.0357. 1c. The null hypothesis should be rejected because the p-value is less than 0.05. 2a. The null hypothesis is that the mean height of male students (μm) is equal to the mean height of female students (μf). 2b. The alternative hypothesis is that the mean height of male students is not equal to the mean height of female students. 2c. The p-value is 0.0243. 2d.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

T test:

A t-test is used to determine if there is a significant difference between the means of two groups and how they
are related.
There are two popular types of t tests:
• A Paired t-test.
• An unpaired t-test.

These statistical tests are commonly used in research in the fields of biology, business, and psychology.

A paired t-test is designed to compare the means of the same group or item under two separate scenarios.

An unpaired t-test compares the means of two independent or unrelated groups. In an unpaired t-test, the
variance between groups is assumed to be equal.

• Paired t-test:
A paired t-test (also known as a dependent or correlated t-test) is a statistical test that compares the
averages/means and standard deviations of two related groups to determine if there is a significant
difference between the two groups.
a. A significant difference occurs when the differences between groups are unlikely to be due to
sampling error or chance.
b. The groups can be related by being the same group of people, the same item, or being subjected
to the same conditions.
c. Paired t-tests are considered more powerful than unpaired t-tests because using the same
participants or item eliminates variation between the samples that could be caused by anything
other than what’s being tested.

What are the hypotheses of a paired t-test?


There are two possible hypotheses in a paired t-test.
a. The null hypothesis (H0) states that there is no significant difference between the means of the
two groups.
b. The alternative hypothesis (H1) states that there is a significant difference between the two
population means, and that this difference is unlikely to be caused by sampling error or chance.

What are the assumptions of a paired t-test?

a. The dependent variable is normally distributed.


b. The observations are sampled independently.
c. The dependent variable is measured on an incremental level, such as ratios or intervals.
d. The independent variables must consist of two related groups or matched pairs.

When to use a paired t-test?


Paired t-tests are used when the same item or group is tested twice, which is known as a repeated
measures t-test.
Some examples of instances for which a paired t-test is appropriate include:
• The before and after effect of a pharmaceutical treatment on the same group of people.
• Body temperature using two different thermometers on the same group of participants.
• Standardized test results of a group of students before and after a study prep course.
Two-sample t-tests are statistical tests used to compare the means of two populations. Also known as
Student’s t-tests, their results are used to determine if there is a significant difference between the mean
of two samples that is unlikely to be due to sampling error or random chance.
• Unpaired t-test?
An unpaired t-test (also known as an independent t-test) is a statistical procedure that compares the
averages/means of two independent or unrelated groups to determine if there is a significant difference
between the two.

What are the hypotheses of an unpaired t-test?


The hypotheses of an unpaired t-test are the same as those for a paired t-test. The two hypotheses are:
• The null hypothesis (H0) states that there is no significant difference between the means of the two
groups.
• The alternative hypothesis (H1) states that there is a significant difference between the two population
means, and that this difference is unlikely to be caused by sampling error or chance.

What are the assumptions of an unpaired t-test?


• The dependent variable is normally distributed.
• The observations are sampled independently.
• The dependent variable is measured on an incremental level, such as ratios or intervals.
• The variance of data is the same between groups, meaning that they have the same standard deviation.
• The independent variables must consist of two independent groups.

When to use an unpaired t-test?


An unpaired t-test is used to compare the mean between two independent groups.
You use an unpaired t-test when you are comparing two separate groups with equal variance.

Examples of appropriate instances during which to use an unpaired t-test:


• Research, such as a pharmaceutical study or other treatment plan, where ½ of the subjects are assigned
to the treatment group and ½ of the subjects are randomly assigned to the control group.
• Research during which there are two independent groups, such as women and men, that examines
whether the average bone density is significantly different between the two groups.
• Comparing the average commuting distance traveled by New York City and San Francisco residents
using 1,000 randomly selected participants from each city.
In the case of unequal variances, a Welch’s test should be used.

Paired vs unpaired t-test


The key differences between a paired and unpaired t-test are summarized below.
• A paired t-test is designed to compare the means of the same group or item under two separate
scenarios. An unpaired t-test compares the means of two independent or unrelated groups.
• In an unpaired t-test, the variance between groups is assumed to be equal. In a paired t-test, the
variance is not assumed to be equal.
Practice:
MAI SL P1 May 2021

Marks Scheme:
Bivariate, Hypothesis testing [272 marks]
1a. [1 mark]
As part of a study into healthy lifestyles, Jing visited Surrey Hills University. Jing recorded a person’s position
in the university and how frequently they ate a salad. Results are shown in the table.

Jing conducted a 𝜒2 test for independence at a 5 % level of significance.


State the null hypothesis.
1b. [2 marks]
Calculate the 𝑝-value for this test.
1c. [2 marks]
State, giving a reason, whether the null hypothesis should be accepted.
2a. [1 mark]
Ms Calhoun measures the heights of students in her mathematics class. She is interested to see if the mean
height of male students, 𝜇$ , is the same as the mean height of female students, 𝜇% . The information is recorded
in the table.

At the 10 % level of significance, a 𝑡-test was used to compare the means of the two groups. The data is
assumed to be normally distributed and the standard deviations are equal between the two groups.
State the null hypothesis.
2b. [1 mark]
State the alternative hypothesis.
2c. [2 marks]
Calculate the 𝑝-value for this test.
2d. [2 marks]
State, giving a reason, whether Ms Calhoun should accept the null hypothesis.
3a. [2 marks]
The Malvern Aquatic Center hosted a 3 metre spring board diving event. The judges, Stan and Minsun awarded
8 competitors a score out of 10. The raw data is collated in the following table.

Write down the value of the Pearson’s product–moment correlation coefficient, 𝑟.


3b. [2 marks]
Using the value of 𝑟, interpret the relationship between Stan’s score and Minsun’s score.
3c. [2 marks]
Write down the equation of the regression line 𝑦 on 𝑥.
3d. [2 marks]
Use your regression equation from part (b) to estimate Minsun’s score when Stan awards a perfect 10.
3e. [2 marks]
State whether this estimate is reliable. Justify your answer.
3f. [2 marks]
The Commissioner for the event would like to find the Spearman’s rank correlation coefficient.
Copy and complete the information in the following table.

3g. [2 marks]
Find the value of the Spearman’s rank correlation coefficient, 𝑟* .
3h. [2 marks]
Comment on the result obtained for 𝑟* .
3i. [1 mark]
The Commissioner believes Minsun’s score for competitor G is too high and so decreases the score from 9.5 to
9.1.
Explain why the value of the Spearman’s rank correlation coefficient 𝑟* does not change.
4a. [1 mark]
Two IB schools, A and B, follow the IB Diploma Programme but have different teaching methods. A research
group tested whether the different teaching methods lead to a similar final result.
For the test, a group of eight students were randomly selected from each school. Both samples were given a
standardized test at the start of the course and a prediction for total IB points was made based on that test; this
was then compared to their points total at the end of the course.
Previous results indicate that both the predictions from the standardized tests and the final IB points can be
modelled by a normal distribution.
It can be assumed that:
• the standardized test is a valid method for predicting the final IB points
• that variations from the prediction can be explained through the circumstances of the student or school.
Identify a test that might have been used to verify the null hypothesis that the predictions from the standardized
test can be modelled by a normal distribution.
4b. [1 mark]
State why comparing only the final IB points of the students from the two schools would not be a valid test for
the effectiveness of the two different teaching methods.
4c. [1 mark]
The data for school A is shown in the following table.
For each student, the change from the predicted points to the final points (𝑓 − 𝑝) was calculated.
Find the mean change.
4d. [2 marks]
Find the standard deviation of the changes.
4e. [4 marks]
Use a paired 𝑡-test to determine whether there is significant evidence that the students in school A have
improved their IB points since the start of the course.
4f. [5 marks]
The data for school B is shown in the following table.

Use an appropriate test to determine whether there is evidence, at the 5 % significance level, that the students in
school B have improved more than those in school A.
4g. [1 mark]
State why it was important to test that both sets of points were normally distributed.
4h. [3 marks]
School A also gives each student a score for effort in each subject. This effort score is based on a scale of 1 to 5
where 5 is regarded as outstanding effort.
It is claimed that the effort put in by a student is an important factor in improving upon their predicted IB
points.
Perform a test on the data from school A to show it is reasonable to assume a linear relationship between effort
scores and improvements in IB points. You may assume effort scores follow a normal distribution.
4i. [1 mark]
Hence, find the expected improvement between predicted and final points for an increase of one unit in effort
grades, giving your answer to one decimal place.
4j. [6 marks]
A mathematics teacher in school A claims that the comparison between the two schools is not valid because the
sample for school B contained mainly girls and that for school A, mainly boys. She believes that girls are likely
to show a greater improvement from their predicted points to their final points.
She collects more data from other schools, asking them to class their results into four categories as shown in the
following table.

Use an appropriate test to determine whether showing an improvement is independent of gender.


4k. [2 marks]
If you were to repeat the test performed in part (e) intending to compare the quality of the teaching between the
two schools, suggest two ways in which you might choose your sample to improve the validity of the test.
5a. [2 marks]
The weights of apples on a tree can be modelled by a normal distribution with a mean of 85 grams and a
standard deviation of 7.5 grams.
Find the probability that an apple from the tree has a weight greater than 90 grams.
5b. [2 marks]
A sample of apples are taken from 2 trees, A and B, in different parts of the orchard.
The data is shown in the table below.

The owner of the orchard wants to know whether the mean weight of the apples from tree A(𝜇8 ) is greater than
the mean weight of the apples from tree B(𝜇9 ) so sets up the following test:
H; : 𝜇8 = 𝜇9 and H$ : 𝜇8 > 𝜇9
Find the 𝑝-value for the owner’s test.
5c. [2 marks]
The test is performed at the 5% significance level.
State the conclusion of the test, giving a reason for your answer.
6a. [2 marks]
The water temperature (𝑇) in Lake Windermere is measured on the first day of eight consecutive months (𝑚)
from January to August (months 1 to 8) and the results are shown below. The value for May (month 5) has
been accidently deleted.

Assuming the data follows a linear model for this period, find the regression line of 𝑇 on 𝑚 for the remaining
data.
6b. [2 marks]
Use your line to find an estimate for the water temperature on the first day of May.
6c. [1 mark]
Explain why your line should not be used to estimate the value of 𝑚 at which the temperature is 10.0 𝐶.
6d. [1 mark]
Explain in context why your line should not be used to predict the value for December (month 12).
6e. [1 mark]
State a more appropriate model for the water temperature in the lake over an extended period of time. You are
not expected to calculate any parameters.
7a. [2 marks]
It is believed that the power 𝑃 of a signal at a point 𝑑 km from an antenna is inversely proportional to 𝑑 G where
𝑛 ∈ ℤK .
The value of 𝑃 is recorded at distances of 1 𝑚 to 5 𝑚 and the values of log$;  𝑑 and log$;  𝑃 are plotted on the
graph below.

Explain why this graph indicates that 𝑃 is inversely proportional to 𝑑 G .


7b. [2 marks]
The values of log$;  𝑑 and log$;  𝑃 are shown in the table below.

Find the equation of the least squares regression line of log$;  𝑃 against log$;  𝑑.
7c. [1 mark]
Use your answer to part (b) to write down the value of 𝑛 to the nearest integer.
7d. [2 marks]
Find an expression for 𝑃 in terms of 𝑑.
8. [2 marks]
A teacher is concerned about the amount of lesson time lost by 8 students through arriving late at school. Over
a period of 2 weeks he records the total number of minutes they are late. He also asks them how far they live
from school. The results are shown in the table below.

Which of the correlation coefficients would you recommend is used to assess whether or not there is an
association between total number of minutes late and distance from school? Fully justify your answer.
9a. [3 marks]
A dice manufacturer claims that for a novelty die he produces the probability of scoring the numbers 1 to 5 are
all equal, and the probability of a 6 is two times the probability of scoring any of the other numbers.
Find the probability of scoring a six when rolling the novelty die.
9b. [4 marks]
Find the probability of scoring more than 2 sixes when this die is rolled 5 times.
9c. [2 marks]
To test the manufacture’s claim one of the novelty dice is rolled 350 times and the numbers scored on the die
are shown in the table below.

Find the expected frequency for each of the numbers if the manufacturer’s claim is true.
9d. [2 marks]
A 𝜒 % goodness of fit test is to be used with a 5% significance level.
Write down the null and alternative hypotheses.
9e. [1 mark]
State the degrees of freedom for the test.
9f. [4 marks]
Determine the conclusion of the test, clearly justifying your answer.
10a. [2 marks]
Dana has collected some data regarding the heights ℎ (metres) of waves against a pier at 50 randomly chosen
times in a single day. This data is shown in the table below.

She wishes to perform a 𝜒 % -test at the 5% significance level to see if the height of waves could be modelled by
a normal distribution. Her null hypothesis is
H; : The data can be modelled by a normal distribution.
From the table she calculates the mean of the heights in her sample to be 0.828 m and the standard deviation of
the heights 𝑠G to be 0.257 m.
Use the given value of 𝑠G to find the value of 𝑠GT$ .
10b. [3 marks]
She calculates the expected values for each interval under this null hypothesis, and some of these values are
shown in the table below.

Find the value of 𝑎 and the value of 𝑏, giving your answers correct to one decimal place.
10c. [2 marks]
%
Find the value of the 𝜒 % test statistic (𝜒WXYW ) for this test.
10d. [2 marks]
Determine the degrees of freedom for Dana’s test.
10e. [2 marks]
It is given that the critical value for this test is 9.49.
State the conclusion of the test in context. Use your answer to part (c) to justify your conclusion.
11a. [7 marks]
Give your answers to four significant figures.
A die is thrown 120 times with the following results.

Showing all steps clearly, test whether the die is fair


(i) at the 5% level of significance;
(ii) at the 1% level of significance.
11b. [3 marks]
Explain what is meant by “level of significance” in part (a).
12a. [7 marks]
A calculator generates a random sequence of digits. A sample of 200 digits is randomly selected from the first
100 000 digits of the sequence. The following table gives the number of times each digit occurs in this sample.

It is claimed that all digits have the same probability of appearing in the sequence.
Test this claim at the 5% level of significance.
12b. [2 marks]
Explain what is meant by the 5% level of significance.
13a. [5 marks]
Kayla wants to measure the extent to which two judges in a gymnastics competition are in agreement. Each
judge has ranked the seven competitors, as shown in the table, where 1 is the highest ranking and 7 is the
lowest.
Calculate Spearman’s rank correlation coefficient for this data.
13b. [1 mark]
State what conclusion Kayla can make from the answer in part (a).
14a. [1 mark]
Charles wants to measure the strength of the relationship between the price of a house and its distance from the
city centre where he lives. He chooses houses of a similar size and plots a graph of price, 𝑃 (in thousands of
dollars) against distance from the city centre, 𝑑 (km).

Explain why it is not appropriate to use Pearson’s product moment correlation coefficient to measure the
strength of the relationship between 𝑃 and 𝑑.
14b. [1 mark]
Explain why it is appropriate to use Spearman’s rank correlation coefficient to measure the strength of the
relationship between 𝑃 and 𝑑.
14c. [6 marks]
The data from the graph is shown in the table.

Calculate Spearman’s rank correlation coefficient for this data.


14d. [1 mark]
State what conclusion Charles can make from the answer in part (c).
15a. [1 mark]
Anita is concerned that the construction of a new factory will have an adverse affect on the fish in a nearby
lake. Before construction begins she catches fish at random, records their weight and returns them to the lake.
After the construction is finished she collects a second, random sample of weights of fish from the lake. Her
data is shown in the table.

Anita decides to use a t-test, at the 5% significance level, to determine if the mean weight of the fish changed
after construction of the factory.
State an assumption that Anita is making, in order to use a t-test.
15b. [1 mark]
State the hypotheses for this t-test.
15c. [3 marks]
Find the p-value for this t-test.
15d. [2 marks]
State the conclusion of this test, in context, giving a reason.
16a. [5 marks]
In an effort to study the level of intelligence of students entering college, a psychologist collected data from
4000 students who were given a standard test. The predictive norms for this particular test were computed from
a very large population of scores having a normal distribution with mean 100 and standard deviation of 10. The
psychologist wishes to determine whether the 4000 test scores he obtained also came from a normal distribution
with mean 100 and standard deviation 10. He prepared the following table (expected frequencies are rounded to
the nearest integer):

Copy and complete the table, showing how you arrived at your answers.
16b. [6 marks]
Test the hypothesis at the 5% level of significance.
17. [9 marks]
Six coins are tossed simultaneously 320 times, with the following results.

At the 5% level of significance, test the hypothesis that all the coins are fair.
18a. [2 marks]
Adesh wants to model the cooling of a metal rod. He heats the rod and
records its temperature as it cools.

He believes the temperature can be modeled by 𝑇(𝑡) = 𝑎e\] + 25, where 𝑎,   𝑏 ∈ ℝ.
Show that ln(𝑇 − 25) = 𝑏𝑡 + ln 𝑎.
18b. [3 marks]
Find the equation of the regression line of ln(𝑇 − 25) on 𝑡.
18c. [3 marks]
Hence
find the value of 𝑎 and of 𝑏.
18d. [2 marks]
predict the temperature of the metal rod after 3 minutes.
19a. [1 mark]
Eggs at a farm are sold in boxes of six. Each egg is either brown or white. The owner believes that the number
of brown eggs in a box can be modelled by a binomial distribution. He examines 100 boxes and obtains the
following data.

Calculate the mean number of brown eggs in a box.


19b. [1 mark]
Hence estimate 𝑝, the probability that a randomly chosen egg is brown.
19c. [8 marks]
By calculating an appropriate 𝜒 % statistic, test, at the 5% significance level, whether or not the binomial
distribution gives a good fit to these data.
20a. [2 marks]
A zoologist believes that the number of eggs laid in the Spring by female birds of a certain breed follows a
Poisson law. She observes 100 birds during this period and she produces the following table.

Calculate the mean number of eggs laid by these birds.


20b. [2 marks]
The zoologist wishes to determine whether or not a Poisson law provides a suitable model.
Write down appropriate hypotheses.
20c. [14 marks]
Carry out a test at the 1% significance level, and state your conclusion.
21. [11 marks]
The number of cars passing a certain point in a road was recorded during 80 equal time intervals and
summarized in the table below.
Carry out a 𝜒 % goodness of fit test at the 5% significance level to decide if the above data can be modelled by a
Poisson distribution.
22a. [2 marks]
The number of telephone calls received by a helpline over 80 one-minute periods are summarized in the table
below.

Find the exact value of the mean of this distribution.


22b. [12 marks]
Test, at the 5% level of significance, whether or not the data can be modelled by a Poisson distribution.
23a. [3 marks]
The heights, 𝑥 metres, of the 241 new entrants to a men’s college were measured and the following statistics
calculated.
∑𝑥 = 412.11,  ∑𝑥 % = 705.5721
Calculate unbiased estimates of the population mean and the population variance.
23b. [1 mark]
The Head of Mathematics decided to use a 𝜒 % test to determine whether or not these heights could be modelled
by a normal distribution. He therefore divided the data into classes as follows.

State suitable hypotheses.


23c. [11 marks]
Calculate the value of the 𝜒 % statistic and state your conclusion using a 10% level of significance.
24a. [1 mark]
A pharmaceutical company has developed a new drug to decrease cholesterol. The final stage of testing the new
drug is to compare it to their current drug. They have 150 volunteers, all recently diagnosed with high
cholesterol, from which they want to select a sample of size 18. They require as close as possible 20% of the
sample to be below the age of 30, 30% to be between the ages of 30 and 50 and 50% to be over the age of 50.
State the name for this type of sampling technique.
24b. [3 marks]
Calculate the number of volunteers in the sample under the age of 30.
24c. [1 mark]
Half of the 18 volunteers are given the current drug and half are given the new drug. After six months each
volunteer has their cholesterol level measured and the decrease during the six months is shown in the table.

Calculate the mean decrease in cholesterol for


The new drug.
24d. [1 mark]
The current drug.
24e. [1 mark]
The company uses a t-test, at the 1% significance level, to determine if the new drug is more effective at
decreasing cholesterol.
State an assumption that the company is making, in order to use a t-test.
24f. [1 mark]
State the hypotheses for this t-test.
24g. [3 marks]
Find the p-value for this t-test.
24h. [2 marks]
State the conclusion of this test, in context, giving a reason.
25a. [12 marks]
Jim writes a computer program to generate 500 values of a variable Z.
He obtains the following table from his results.

Use a chi-squared goodness of fit test to investigate whether or not, at


the 5 % level of significance, the N(0, 1) distribution can be used to
model these results.

You might also like