S1 Correlation and Regression - Regression
S1 Correlation and Regression - Regression
com
1. A travel agent sells flights to different destinations from Beerow airport. The distance d,
measured in 100 km, of the destination from the airport and the fare £f are recorded for a
random sample of 6 destinations.
Destination A B C D E F
d 2.2 4.0 6.0 2.5 8.0 5.0
f 18 20 25 23 32 28
(a) Using the axes below, complete a scatter diagram to illustrate this information.
(2)
(b) Explain why a linear regression model may be appropriate to describe the relationship
between f and d.
(1)
(d) Calculate the equation of the regression line of f on d giving your answer in the form
f = a + bd.
(4)
Jane is planning her holiday and wishes to fly from Beerow airport to a destination t km away.
A rival travel agent charges 5p per km.
(f) Find the range of values of t for which the first travel agent is cheaper than the rival.
(2)
(Total 14 marks)
2. The blood pressures, p mmHg, and the ages, t years, of 7 hospital patients are shown in the table
below.
Patient A B C D E F G
t 42 74 48 35 56 26 60
p 98 130 120 88 182 80 135
[∑ t = 341, ∑ p = 833, ∑ t 2
= 18181, ∑p 2
= 106397, ∑ tp = 42948]
(b) Calculate the product moment correlation coefficient for these data.
(3)
(d) On the graph paper below, draw the scatter diagram of blood pressure against age for
these 7 patients.
(2)
(g) Use your regression line to estimate the blood pressure of a 40 year old patient.
(2)
(Total 18 marks)
3. The weight, w grams, and the length, l mm, of 10 randomly selected newborn turtles are given
in the table below.
l 49.0 52.0 53.0 54.5 54.1 53.4 50.0 51.6 49.5 51.2
w 29 32 34 39 38 35 30 31 29 30
(a) Find the equation of the regression line of w on l in the form w = a + bl.
(5)
(b) Use your regression line to estimate the weight of a newborn turtle of length 60 mm.
(2)
(c) Comment on the reliability of your estimate giving a reason for your answer.
(2)
(Total 9 marks)
4. A teacher is monitoring the progress of students using a computer based revision course. The
improvement in performance, y marks, is recorded for each student along with the time, x hours,
that the student spent using the revision course. The results for a random sample of 10 students
are recorded below.
x
1.0 3.5 4.0 1.5 1.3 0.5 1.8 2.5 2.3 3.0
hours
y
5 30 27 10 –3 –5 7 15 –10 20
marks
(b) Find the equation of the least squares regression line of y on x in the form y = a + bx.
(4)
Lee spends 8 hours using the revision course claiming that this should give him an improvement
in performance of over 60 marks.
5. Crickets make a noise. The pitch, v kHz, of the noise made by a cricket was recorded at 15
different temperatures, t °C. These data are summarised below.
∑t 2
= 10 922.81, ∑v 2
= 42.3356, ∑ tv = 677.971, ∑ t = 401.3, ∑ v = 25.08
(a) Find Stt, Svv and Stv for these data.
(4)
(d) Give a reason to support fitting a regression model of the form v = a + bt to these data.
(1)
(e) Find the value of a and the value of b. Give your answers to 3 significant figures.
(4)
(f) Using this model, predict the pitch of the noise at 19 °C.
(1)
(Total 15 marks)
6. A metallurgist measured the length, l mm, of a copper rod at various temperatures, t°C, and
recorded the following results.
t l
20.4 2461.12
27.3 2461.41
32.1 2461.73
39.0 2461.88
42.9 2462.03
49.7 2462.37
58.3 2462.69
67.4 2463.05
(b) Find the equation of the regression line of y on x in the form y = a + bx.
(5)
7. A manufacturer stores drums of chemicals. During storage, evaporation takes place. A random
sample of 10 drums was taken and the time in storage, x weeks, and the evaporation loss, y ml,
are shown in the table below.
x 3 5 6 8 10 12 13 15 16 18
y 36 50 53 61 69 79 82 90 88 96
(a) On the grid below, draw a scatter diagram to represent these data.
(3)
(b) Give a reason to support fitting a regression model of the form y = a + bx to these data.
(1)
(You may use Σx2 = 1352, Σy2 = 53 112 and Σxy = 8354.)
(7)
(e) Using your model, predict the amount of evaporation that would take place after
(i) 19 weeks,
(ii) 35 weeks.
(2)
(Total 18 marks)
8. A long distance lorry driver recorded the distance travelled, m miles, and the amount of fuel
used, f litres, each day. Summarised below are data from the driver’s records for a random
sample of 8 days.
(a) Find the equation of the regression line of y on x in the form y = a + bx.
(6)
9. The following table shows the height x, to the nearest cm, and the weight y, to the nearest kg, of
a random sample of 12 students.
x 148 164 156 172 147 184 162 155 182 165 175 152
y 39 59 56 77 44 77 65 49 80 72 70 52
(b) Write down, with a reason, whether the correlation coefficient between x and y is positive
or negative.
(2)
(e) Find, to 3 significant figures, the mean y and the standard deviation s of the weights of
this sample of students.
(3)
(g) Comment on whether or not you think that the weights of these students could be
modelled by a normal distribution.
(1)
(Total 15 marks)
10. An experiment carried out by a student yielded pairs of (x, y) observations such that
(a) Calculate the equation of the regression line of y on x in the form y = a + bx. Give your
values of a and b to 2 decimal places.
(3)
11. A researcher thinks there is a link between a person's height and level of confidence. She
measured the height h, to the nearest cm, of a random sample of 9 people. She also devised a
test to measure the level of confidence c of each person. The data are shown in the table below.
[You may use ∑h2 = 272 094, ∑c2 = 2 878 966, ∑hc = 884 484]
(c) Calculate the value of the product moment correlation coefficient for these data.
(3)
(e) Calculate the equation of the regression line of c on h in the form c = a + bh.
(3)
(g) State the range of values of h for which estimates of c are reliable.
(1)
(Total 18 marks)
12. An office has the heating switched on at 7.00 a.m. each morning. On a particular day, the
temperature of the office, t °C, was recorded m minutes after 7.00 a.m. The results are shown in
the table below.
m 0 10 20 30 40 50
t 6.0 8.9 11.8 13.5 15.3 16.1
(b) Calculate the equation of the regression line of t on m in the form t = a + bm.
(3)
(d) State, giving a reason, whether or not you would use the regression equation in (b) to
estimate the temperature
13. A company wants to pay its employees according to their performance at work. The
performance score x and the annual salary, y in £100s, for a random sample of 10 of its
employees for last year were recorded. The results are shown in the table below.
x 15 40 27 39 27 15 20 30 19 24
y 216 384 234 399 226 132 175 316 187 196
(c) (i) Calculate the equation of the regression line of y on x, in the form y = a + bx.
The company decides to use this regression model to determine future salaries.
(e) Find the proposed annual salary for an employee who has a performance score of 35.
(2)
(Total 16 marks)
14. Eight students took tests in mathematics and physics. The marks for each student are given in
the table below where m represents the mathematics mark and p the physics mark.
Student
A B C D E F G H
Mark m 9 14 13 10 7 8 20 17
p 11 23 21 15 19 10 31 26
A science teacher believes that students’ marks in physics depend upon their mathematical
ability. The teacher decides to investigate this relationship using the test marks.
(c) Showing your working, find the equation of the regression line of p on m.
(8)
A ninth student was absent for the physics test, but she sat the mathematics test and scored 15.
(e) Using this model, estimate the mark she would have scored in the physics test.
(2)
(Total 16 marks)
15. The chief executive of Rex cars wants to investigate the relationship between the number of
new car sales and the amount of money spent on advertising. She collects data from company
records on the number of new car sales, c, and the cost of advertising each year, p (£000). The
data are shown in the table below.
(a) Using the coding x = (p – 100) and y = 1 (c – 4000), draw a scatter diagram to
10
represent these data. Explain why x is the explanatory variable.
(5)
(c) Deduce the equation of the least squares regression line of c on p in the form c = a + bp.
(3)
(e) Predict the number of extra new cars sales for an increase of £2000 in advertising budget.
Comment on the validity of your answer.
(2)
(Total 19 marks)
16. To test the heating of tyre material, tyres are run on a test rig at chosen speeds under given
conditions of load, pressure and surrounding temperature. The following table gives values of x,
the test rig speed in miles per hour (mph), and the temperature, y °C, generated in the shoulder
of the tyre for a particular tyre material.
x (mph) 15 20 25 30 35 40 45 50
y (°C) 53 55 63 65 78 83 91 101
(b) Give a reason to support the fitting of a regression line of the form y = a + bx
through these points.
(1)
(e) Use your line to estimate the temperature at 50 mph and explain why this estimate
differs from the value given in the table.
(2)
A tyre specialist wants to estimate the temperature of this tyre material at 12 mph and 85 mph.
(f) Explain briefly whether or not you would recommend the specialist to use this regression
equation to obtain these estimates.
(4)
(Total 16 marks)
1. (a) B1 B1 2
Note
1st B1 for at least 4 points correct (allow ± one 2mm square)
2nd B1 for all points correct (allow ± one 2 mm square
=
(c) ∑ d 27.7,
= ∑ f 146 (both, may be implied) B1
Sdd
= 152.09 −
( 27.7 )
2
= 24.208….. awrt 24.2 M1 A1
6
27.7 ×146
S fd 723.1 −
= = 49.06…. awrt 49.1 A1 4
6
Note
M1 for a correct method seen for either – a correct expression
1st A1 for Sdd awrt 24.2
2nd A1 for Sfd awrt 49.1
S fd
(d) b= = 2.026…. awrt 2.03 M1 A1
Sdd
146 27.7
a= −b× = 14.97….. so f = 15.0 + 2.03d M1 A1 4
6 6
Note
1st M1 for a correct expression for b – can follow through their
answers from (c)
2nd M1 for a correct method to find a – follow through their b
and their means
2nd A1 for f = .... in terms of d and all values awrt given expressions.
Accept 15 as rounding from correct answer only.
(e) A flight costs £2.03 (or about £2) for every extra 100km
or about 2p per km. B1ft 1
Note
Context of cost and distance required. Follow through their value of b
15.0
(f) 15.0 + 2.03d < 5d so d > = 5.00 ~ 5.05 M1
( 5 − 2.03)
So t > 500~505 A1 2
Note
M1 for an attempt to find the intersection of the 2 lines. Value of t in
range 500 to 505 seen award M1.
Value of d in range 5 to 5.05 award M1.
Accept t greater than 500 to 505 inclusive to include graphical
solution for M 1A1
[14]
833 2
2. (a) S pp = 106397 – = 7270 M1 A1
7
341 × 833
S pp = 42948 – = 2369,
7
3412 10986
S tt = 18181 – = 1569.42857.... or A1 A1 4
7 7
Note
M1 for at least one correct expression
1 A1 for Spp = 7270, 2nd A1 for Stp = 2369 or 2370,
st
2369
(b) r= M1 A1ft
7270 × 1569.42857...
= 0.7013375 awrt (0.701) A1 3
Note
M1 for attempt at correct formula and at least one
correct value (or correct ft) M0 for
42948
106397 × 18181
A1ft All values correct or correct ft. Allow for
an answer of 0.7 or 0.70 Answer only: awrt
0.701 is 3/3, answer of 0.7 or 0.70 is 2/3
2369
(e) b= = 1.509466... M1 A1
1569.42857...
833 341
a= – b× = 45.467413... M1
7 7
P = 45.5+1.51t A1 4
Note
1st M1 for use of the correct formula for b,
ft their values from (a)
1st A1 allow 1.5 or better
2nd M1 for use of y –b x with their values
Note
1st B1ft ft their intercept (within one square).
You may have to extend their line.
2nd B1 for correct gradient i.e. parallel to given
line (Allow 1 square out when t = 80)
59.99
3. (a) b= M1
33.381
= 1.79713….. 1.8 or awrt 1.80 A1
a = 32.7 – 1.79713…× 51.83 M1
= – 60.44525… awrt –60 A1
w = – 60.445251…+ 1.79713…l l and w required
and awrt 2sf A1ft 5
Note
Special case
59.99
b= = 0.4995 M0A0
120.1
a = 32.7 – 0.4995 × 51.83 M1A1
w = 6.8 + 0.50l at least 2 sf required
for A1
(21.4) 2
4. (a) Sxx = 57.22 – = 11.424 M1
10
A1
21.4 × 96
Sxy = 313.7 – = 108.26 A1 3
10
Note
M1 for a correct expression
st
1 A1 for AWRT 11.4 for Sxx
2nd A1 for AWRT 108 for Sxy
Correct answers only: One value correct scores
M1 and appropriate A1, both correct M1A1A1
S xy
(b) b= = 9.4765... M1 A1
S xx
M1
a = y – b x = 9.6 – 2.14b = (–10.679...) M1 4
y = – 10.7 + 9.48x
Note
1st M1 for using their values in correct formula
1st A1 for AWRT 9.5
2nd M1 for correct method for a (minus sign required)
2nd A1 for equation with a and b AWRT 3 sf (e.g. y = –10.68 + 9.48x is fine)
Must have a full equation with a and b correct to awrt 3 sf
(c) Every (extra) hour spent using the programme produces about B1ft 1
9.5 marks improvement
Note
B1ft for comment conveying the idea of b marks per hour. Must
mention value of b but can ft their value of b. No need to mention
“extra” but must mention “marks” and “hour(s)” e.g. “…9.5
times per hour …” scores B0
(e) Model may not be valid since [8h is] outside the range [0.5 – 4]. B1 1
Note
B1 for a statement that says or implies that it may not be valid
because outside the range.
They do not have to mention the values concerned here namely
8 h or 0.5 – 4
[11]
401.3 2
5. (a) Stt = 10922.81 − = 186.6973 awrt 187 M1A1
15
25.08 2
Svv = 42.3356 − = 0.40184 awrt 0.402 A1
15
401.3 × 25.08
Stv = 677.971 – = 6.9974 awrt 7.00 A1 4
15
M1 any one attempt at a correct use of a formula.
Award full marks for correct answers with no working.
Epen order of awarding marks as above.
6.9974
(b) r= M1A1ft
186.6973 × 0.40184
= 0.807869 awrt 0.808 A1 3
M1 for correct formula and attempt to use
A1ft for their values from part (a)
677.971
NB Special Case for M1A0
10922.81× 42.3356
A1 awrt 0.808
Award 3 marks for awrt 0.808 with no working
6.9974
(e) b= = 0.03748 awrt 0.0375 M1A1
186.6973
25.08 401.3
a= −b× = 0.6692874 awrt 0.669 M1A1 4
15 15
M1 their values the right way up
A1 for awrt 0.0375
M1 attempt to use correct formula with their value of b
A1 awrt 0.669
337.1×16.28
S xy = 757.467 − = 71.4685 M1 A1
8
either method, awrt 71.5
337.12
S xx = 15965.01 − = 1760.45875 A1 5
8
awrt 1760
71.4685
(b) b= = 0.04059652 M1 A1
1760.45875
÷ correct way up, awrt 0.0406
16.28 337.1
a= − b× = 0.324364 M1 A1
8 8
using correct formula, awrt 0.324
y = 0.324 + 0.0406x A1ft 5
3 sf or better but award for copying from above
100
18, 96
15, 90
Evaporation loss (y ml)
16, 88
13, 82
80 12, 79
10, 69
60 8, 61
6, 53
5, 50
40
3, 36
20
0
0 2 4 6 8 10 12 14 16 18 20
Time (x weeks)
(d) For every extra week in storage, another 3.90 ml of chemical evaporates B1 1
130 × 48
8. (a) Sxy = 8880 – = (8100) B1
8
may be implied
Sxx = 20487.5
S xy 81000
b= = = 0.395363… M1 A1
S xx 20487.5
Allow use of their Sxy for M
awrt 0.395
48 130
a= − (0.395363...) = –0.424649… M1 A1
8 8
allow use of their b for M
awrt –0.425
y = – 0.425 + 0.395x B1ft 6
3s.f.
Special case answer only B0 M0 B1 M0 B1 B1
(fully correct 3sf)
(≡ to B0 M0 A1 M0 A1 B1 on the epen)
9. (a)
90
80 182, 80
172, 77 184, 77
165, 72
70 175, 70
162, 65
Weight (y)
60 164, 59
156, 56
152, 52
50 155, 49
147, 44
40 148, 39
30
140 150 160 170 180 190
Height (x)
sensible scales B1
labels B1
shape B1 3
1962 × 740
(c) Sxy = 122783 – = 1793 M1A1 2
12
use of formula, cao
(1793 only M1A1)
S xy 1793
(d) b= = = 1.027507… M1A1 2
S xx 1745
division, 1.028
(SR 1.028 B1 only)
740 2
(e) y= = 61 B1
12 3
2
61 or 61.6 or 61.7
3
2
47746 740
s= − = 13.26859 M1A1 3
12 12
Use of formula including root, 13.3 or 13.9
(SR 13.3 or 13.9 B1 only)
S xy 3477.6
10. (a) b= = = 0.7900… B1
S xx 4402
awrt0.79
a = y − b x = 28.6 – (0.7900…) × 36 = 0.159836… B1
awrt 0.16
y = 0.16 + 0.79x B1ft 3
or equivalent
11. (a)
600 ×
580 ×
×
×
×
560 × ×
540 ××
520
500
Labels (not x, y) B1
Sensible scales allow axis interchange B1
Points B2 4
(−1 ee)
1562× 5088
(b) Shc = 884484 − = 1433⅓ M1
9
correct use of S
1433⅓; 1433.3 A1
Shh = 1000 2 9 ; Scc = 2550 A1; A1 4
1000 2 9 , 1000.2 ; 2550
(NB: accept :- 9; i.e.:- 159 7 27 ; 111 1181 ; 283⅓)
1433 13
(c) r= M1
1000 2 9 × 2550
substitution in correct formula
= 0.897488…. A1 ft A1 3
AWRT 0.897(accept 0.8975)
1433.3
(e) b= = 1.433014….. M1
1000.2
5088 1433.3 1562
a= − × = 316.6256… M1
9 1000.2 9
allow use of their b
∴c = 317 + 1.43h (3sf) A1 3
357
(b) b= = 0.204 M1
1750
71.6 150 •
a= − 0.204 × = 6.8 3 M1
6 6
∴t = 6.83 + 0.204 m
No working seen SR: t = 6.83 + 0.204m B1 only A1 3
5
Accept 6.83 , 6.83, 6 %
6
(c) 7.35 ⇒ m = 35
∴t = 6.83 + 0.204 × 35 = 13.973 M1 A1 2
14.0 AWRT
(ii) No; No evidence model will apply one month later B1; B1 4
[13]
13. (a)
Salary (£00’s)
400
350
300
250
200
150
100
50
0 10 20 30 40 50
Performance score
256 × 2465
(b) Sxy = 69798 – = 6694
10
256, 2465 in (b) B1
Sxy or Sxx M1
2
256
Sxx = 7266 – = 712.4
10
6694 B1
712.4 B1
SR: No working ⇒ B0 M0 B1 B1 4
6694
(c) (i) b= = 9.3964… M1 A1
712.4
(their Sxy and Sxx) AWRT
9.40
2465 6694 256
a= − × = 5.95199… M1
10 712.4 10
Using their values
∴y = 5.95 + 9.40x A1 ft
3.s.f.
Þ
35
×
30
×
25
×
×
20
×
15 ×
×
10 ×
(b)
0
5 10 15 20 25 m
scales and labels B1
points B2
(6,7 points) B1 3
Line M1 A1
15. (a)
x 20 26 32 34 37 44 48 50 53 58 B1
y 24 38 42 44 43 52 59 66 70 79
402 × 517
(b) Sxy = 22611 – = 1827.6 M1 A1
10
402 2
Sxx = 17538 – = 1377.6 A1
10
S xy 1827.6
b= = = 1.326655… M1 A1
S xx 1377.6
517 402
a= – (1.326655…) × = –1.63153… B1
10 10
∴ y = –1.63 + 1.33x B1 ft 7
c – 4000
(c) = –1.63 + 1.33(p – 100) M1 A1 ft
10
c = 2653.7 + 13.3p A1 3
16. (a)
y
100
90
80
Temp
°C
70
60
50
x
15 25 35 45 55
Speed (mph)
Scales & labels B1
Points B2, 1, 0 3
1. The vast majority of candidates produced accurate scatter diagrams and on the rare occasion that
there was a point missing it was predominantly point D. Explaining exactly why a linear
regression model was appropriate proved to be difficult for candidates overall. Most candidates
seemed to have the general idea but did not express this in the required terms and consequently
very few earned this mark. Comments tended to be much more general about why linear
regression is carried out and most talked about correlation being high without explaining that
the points lie close to a line.
On the whole the correct formulae were used in calculations of Sdd and Sfd, with most candidates
earning the method mark at the very least. The same was true in the calculations of b and a
overall, although a common mistake was to calculate Sff and go onto use that in the calculation
of b. Premature approximation cost many candidates accuracy marks. Interpretations of the
value of b were considerably varied, with relatively few candidates gaining this mark and some
opted to omit this part altogether. Most candidates failed to relate their value to the context of
the question and often tended to discuss b merely in terms of being the gradient. As a
consequence, despite having the right kind of idea and correctly understanding the concept of
the gradient, frequently candidates failed to gain this mark due to missing out the relevant units,
mixing up the units or not quoting the actual value of b.
Very few candidates were able to formulate the correct equation with the correct units in part
(f), and the majority found this particularly challenging, either omitting this part or resorting to
evaluating the lines at the data points rather than equating and solving the equations. Often no
clear strategy was apparent and a common mistake was to equate their equation to 5. There was
clearly confusion over t and d and even out of those who were able to solve the required
equation or inequality, not many found the value of t or range of t in km, as most tended to give
their answer in terms of d. Occasionally the intersection point was evaluated using their graph
after the lines had been plotted.
2. This was a high scoring question for most candidates. The calculations in parts (a) and (b) were
answered very well with very few failing to use the formulae correctly. Part (c) received a good
number of correct responses but many still failed to interpret their value and simply described
the correlation as strongly positive. The scatter diagram was usually plotted correctly and most
knew how to calculate the equation of the regression line although some used S pp instead of S tt
and some gave their final equation in terms of y and x instead of p and t . Plotting the line in part
(f) proved quite challenging for many candidates and a number with the correct equation did not
have the gradient correct. Part (g) was usually well done but some chose to use their graph
rather than their equation of the line and lost the final accuracy mark.
3. There were some good responses to this question, but some candidates calculated the slope as
59.99/120.1, although 3/5 marks were obtained if they went on to produce the equation as w =
6.8 + 0.50l provided a minimum of 2 significant figures were used. Candidates should be able to
identify the independent and dependent variables from a contextual question. The accuracy
mark for the calculation of the intercept was lost if they used the rounded value of 1.8 for the
slope in the calculation for the intercept. Many candidates did not believe 60mm to be quite far
enough away from the data range to be called extrapolation showing that they did not go back
and read the question carefully enough and consider the range of values given.
4. This proved to be a straightforward starter for most candidates who were able to tackle part (a)
confidently, usually scoring full marks. Part (b) was answered well too; the correct formulae
were selected and answers were usually given to 3 sf or better. Some candidates lost the final
mark here for failing to give the full equation. Part (c) though was not answered well. There
were plenty of comments about the gradient being positive or there being positive correlation or
even skewness. Few realised that the instruction to “interpret” wanted an answer in context and
comments conveying the idea that every extra hour spent on the programme yields an extra 9.5
marks were rare. Part (d) was straightforward again but some did not use their regression
equation to find the estimate but rather tried to interpolate between the values of 3 and 3.5 given
in the table. Part (e) had a mixed response. Many good candidates rejected Lee’s comment on
the basis that 8 hours was outside the range of the data and they secured the mark. Other, less
successful, candidates simply calculated the value and then agreed with Lee or they rejected his
claim on some other basis such as the difficulty of revising for 8 hours or 60 marks might take
him above the total score on the paper.
5. This was done well by all but the weakest students with most using sufficient accuracy to score
highly. Many candidates demonstrated an understanding of the use of the formulae to achieve
full marks in part (a) and part (b). By far the main reason for loss of marks was premature
approximation. Part (c) and part (d) were done well by good candidates. Only the more able
candidates had a correct reason why t was the explanatory variable. Many called v the
explanatory variable but gave a correct reason for t. The written parts were not universally done
correctly, although the ability of students to deal with this topic has improved considerably in
recent examinations. Rounding once again caused issues in part (e), but usually did not have an
effect on part (f).
6. In part (a) calculating Σl instead of the required Σy was the most common reason for losing
marks. In part (b) premature approximation was frequent and caused a loss of marks in other
parts of the question. In part (c) substituting t=40 was usually attempted but some then
neglected to add on the 2460. Candidates are now very well primed to say that a certain value is
out of range and hence the result is not reliable.
7. Graphs were well done and candidates are finally labelling axes, but poor choice of scale for the
x-axis meant some struggled to plot the graph accurately. For a standard question part (b) was
disappointing with many answers referring to correlation but not to a straight line or line of best
fit. Part (c) was generally well answered with the inevitable loss of the last mark through lack of
accuracy by using 3.9 or not reading the question for the 2 decimal places required for the
answers. A significant minority also thought that b represented the product moment correlation
coefficient. Responses to part (d) usually missed the context of the question and in part (f) the
proximity to the range of values of x was often omitted.
8. Candidates were well prepared for this question. The major problems arose as a result of
rounding. The most surprising was rounding to 1 significant figure! This came up a great deal
too frequently. It should be established now that there is a need to keep values for a and b
un-rounded when ‘decoding’ the line but to express answers to 3 significant figures in the final
stages.
9. Most candidates can plot and interpret scatter diagrams and use the formulae given in the
formula book. A significant number of candidates still cannot correctly calculate the standard
deviation to the required accuracy. A significant minority worked out the standard deviation of
the x-values by mistake and of those who worked out the correct standard deviation, many used
a premature approximation of the mean of 61.7 losing the accuracy mark
10. Most candidates were able to score well on this question. The values of both b and a were
usually found accurately with most candidates giving the equation of the regression line of y on
x to the required degree of accuracy. The value of y when x = 45 rarely caused any problems.
11. This question was familiar to most candidates and many of them answered it very well. This
being said, too many used scales that were not sensible for the scatter diagram and far too many
ignored the instruction to ‘find the exact value’. The interpretation of the correlation coefficient
was rarely given in terms of the context of the question and many candidates did not give the
values of a and b to 3 significant figures in spite of previous advice.
12. Apart form arithmetic errors the first three parts of this question were well answered and many
candidates gained most of the marks. It was good to see that many more of the regression
equations were calculated with coefficients given to 3 significant figures. In the final part of the
question, whilst there were many good solutions, some candidates did not state whether or not
they would use the equation and others did not appreciate the context of the question.
13. Although the data in this question did not lend itself to easily chosen scales, most candidates did
manage to produce a reasonable scatter diagram and eventually they were able to draw their
regression line on it. Most candidates answered parts (b) and (c) well although some of them did
not give their values of a and b to 3 significant figures as stated in the question. Whilst many
candidates knew what was required in parts (d) and (e) they were unable to handle the units.
14. Many candidates appeared not to have sufficient time to complete this question. Not all
candidates recognised the explanatory variable, leading some of them to find the wrong
regression line. Apart from the use of silly scales the scatter diagram was often correctly drawn
with many candidates going on to find correct values for the regression coefficients. Accuracy
was much better handled in this question than in similar questions on previous papers. Too
many candidates gave their final answer in terms of x and y rather than m and p.
15. Overall candidates responded well to this question. They knew how to work out the values of a
and b in part (b) but their accuracy often let them down. They did not work to a sufficient
degree of accuracy and a value of –1.77 was often seen instead of –1.63.Scatter diagrams were
often correctly drawn but the scales used by many candidates were often not sensible. The back
substitution in part (c) and the prediction in part (e) was beyond many of the candidates.