0% found this document useful (0 votes)
8 views

S1 Correlation and Regression - Regression

Uploaded by

Nisali Mendis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

S1 Correlation and Regression - Regression

Uploaded by

Nisali Mendis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

S1 Correlation and regression – Regression PhysicsAndMathsTutor.

com

1. A travel agent sells flights to different destinations from Beerow airport. The distance d,
measured in 100 km, of the destination from the airport and the fare £f are recorded for a
random sample of 6 destinations.

Destination A B C D E F
d 2.2 4.0 6.0 2.5 8.0 5.0
f 18 20 25 23 32 28

[You may use ∑d 2


= 152.09 ∑f 2
= 3686 ∑ fd = 723.1]

(a) Using the axes below, complete a scatter diagram to illustrate this information.

(2)

(b) Explain why a linear regression model may be appropriate to describe the relationship
between f and d.
(1)

(c) Calculate Sdd and Sfd


(4)

Edexcel Internal Review 1


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(d) Calculate the equation of the regression line of f on d giving your answer in the form
f = a + bd.
(4)

(e) Give an interpretation of the value of b.


(1)

Jane is planning her holiday and wishes to fly from Beerow airport to a destination t km away.
A rival travel agent charges 5p per km.

(f) Find the range of values of t for which the first travel agent is cheaper than the rival.
(2)
(Total 14 marks)

2. The blood pressures, p mmHg, and the ages, t years, of 7 hospital patients are shown in the table
below.

Patient A B C D E F G
t 42 74 48 35 56 26 60
p 98 130 120 88 182 80 135

[∑ t = 341, ∑ p = 833, ∑ t 2
= 18181, ∑p 2
= 106397, ∑ tp = 42948]

(a) Find Spp, Stp and Stt for these data.


(4)

(b) Calculate the product moment correlation coefficient for these data.
(3)

(c) Interpret the correlation coefficient.


(1)

Edexcel Internal Review 2


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(d) On the graph paper below, draw the scatter diagram of blood pressure against age for
these 7 patients.

(2)

Edexcel Internal Review 3


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(e) Find the equation of the regression line of p on t.


(4)

(f) Plot your regression line on your scatter diagram.


(2)

(g) Use your regression line to estimate the blood pressure of a 40 year old patient.
(2)
(Total 18 marks)

3. The weight, w grams, and the length, l mm, of 10 randomly selected newborn turtles are given
in the table below.

l 49.0 52.0 53.0 54.5 54.1 53.4 50.0 51.6 49.5 51.2
w 29 32 34 39 38 35 30 31 29 30

(You may use Sll = 33.381 Swl = 59.99 Sww = 120.1)

(a) Find the equation of the regression line of w on l in the form w = a + bl.
(5)

(b) Use your regression line to estimate the weight of a newborn turtle of length 60 mm.
(2)

(c) Comment on the reliability of your estimate giving a reason for your answer.
(2)
(Total 9 marks)

Edexcel Internal Review 4


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

4. A teacher is monitoring the progress of students using a computer based revision course. The
improvement in performance, y marks, is recorded for each student along with the time, x hours,
that the student spent using the revision course. The results for a random sample of 10 students
are recorded below.

x
1.0 3.5 4.0 1.5 1.3 0.5 1.8 2.5 2.3 3.0
hours
y
5 30 27 10 –3 –5 7 15 –10 20
marks

[You may use ∑ x = 21.4, ∑ y = 96, ∑ x 2


= 57.22, ∑ xy = 313.7 ]

(a) Calculate Sxx and Sxy.


(3)

(b) Find the equation of the least squares regression line of y on x in the form y = a + bx.
(4)

(c) Give an interpretation of the gradient of your regression line.


(1)

Rosemary spends 3.3 hours using the revision course.

(d) Predict her improvement in marks.


(2)

Lee spends 8 hours using the revision course claiming that this should give him an improvement
in performance of over 60 marks.

(e) Comment on Lee’s claim.


(1)
(Total 11 marks)

Edexcel Internal Review 5


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

5. Crickets make a noise. The pitch, v kHz, of the noise made by a cricket was recorded at 15
different temperatures, t °C. These data are summarised below.

∑t 2
= 10 922.81, ∑v 2
= 42.3356, ∑ tv = 677.971, ∑ t = 401.3, ∑ v = 25.08
(a) Find Stt, Svv and Stv for these data.
(4)

(b) Find the product moment correlation coefficient between t and v.


(3)

(c) State, with a reason, which variable is the explanatory variable.


(2)

(d) Give a reason to support fitting a regression model of the form v = a + bt to these data.
(1)

(e) Find the value of a and the value of b. Give your answers to 3 significant figures.
(4)

(f) Using this model, predict the pitch of the noise at 19 °C.
(1)
(Total 15 marks)

Edexcel Internal Review 6


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

6. A metallurgist measured the length, l mm, of a copper rod at various temperatures, t°C, and
recorded the following results.

t l
20.4 2461.12
27.3 2461.41
32.1 2461.73
39.0 2461.88
42.9 2462.03
49.7 2462.37
58.3 2462.69
67.4 2463.05

The results were then coded such that x = t and y = l – 2460.00.

(a) Calculate Sxy and Sxx.

(You may use Σx2 = 15965.01 and Σxy = 757.467)


(5)

(b) Find the equation of the regression line of y on x in the form y = a + bx.
(5)

(c) Estimate the length of the rod at 40°C.


(3)

(d) Find the equation of the regression line of l on t.


(2)

(e) Estimate the length of the rod at 90°C.


(1)

Edexcel Internal Review 7


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(f) Comment on the reliability of your estimate in part (e).


(2)
(Total 18 marks)

7. A manufacturer stores drums of chemicals. During storage, evaporation takes place. A random
sample of 10 drums was taken and the time in storage, x weeks, and the evaporation loss, y ml,
are shown in the table below.

x 3 5 6 8 10 12 13 15 16 18
y 36 50 53 61 69 79 82 90 88 96

(a) On the grid below, draw a scatter diagram to represent these data.
(3)

(b) Give a reason to support fitting a regression model of the form y = a + bx to these data.
(1)

(c) Find, to 2 decimal places, the value of a and the value of b.

(You may use Σx2 = 1352, Σy2 = 53 112 and Σxy = 8354.)
(7)

(d) Give an interpretation of the value of b.


(1)

(e) Using your model, predict the amount of evaporation that would take place after

(i) 19 weeks,

(ii) 35 weeks.
(2)

Edexcel Internal Review 8


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(f) Comment, with a reason, on the reliability of each of your predictions.


(4)

(Total 18 marks)

Edexcel Internal Review 9


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

8. A long distance lorry driver recorded the distance travelled, m miles, and the amount of fuel
used, f litres, each day. Summarised below are data from the driver’s records for a random
sample of 8 days.

The data are coded such that x = m – 250 and y = f – 100.

∑x = 130 ∑y = 48 ∑xy = 8880 Sxx = 20 487.5

(a) Find the equation of the regression line of y on x in the form y = a + bx.
(6)

(b) Hence find the equation of the regression line of f on m.


(3)

(c) Predict the amount of fuel used on a journey of 235 miles.


(1)
(Total 10 marks)

9. The following table shows the height x, to the nearest cm, and the weight y, to the nearest kg, of
a random sample of 12 students.

x 148 164 156 172 147 184 162 155 182 165 175 152
y 39 59 56 77 44 77 65 49 80 72 70 52

(a) On graph paper, draw a scatter diagram to represent these data.

(One sheet of graph paper to be provided)


(3)

(b) Write down, with a reason, whether the correlation coefficient between x and y is positive
or negative.
(2)

Edexcel Internal Review 10


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

The data in the table can be summarised as follows.

Σx = 1962, Σy = 740, Σy2 = 47 746, Σxy = 122 783, Sxx = 1745.

(c) Find Sxy.


(2)

The equation of the regression line of y on x is y = –106.331 + bx.

(d) Find, to 3 decimal places, the value of b.


(2)

(e) Find, to 3 significant figures, the mean y and the standard deviation s of the weights of
this sample of students.
(3)

(f) Find the values of y ± 1.96s.


(2)

(g) Comment on whether or not you think that the weights of these students could be
modelled by a normal distribution.
(1)
(Total 15 marks)

10. An experiment carried out by a student yielded pairs of (x, y) observations such that

x = 36, y = 28.6, Sxx = 4402, Sxy = 3477.6

(a) Calculate the equation of the regression line of y on x in the form y = a + bx. Give your
values of a and b to 2 decimal places.
(3)

(b) Find the value of y when x = 45.


(1)
(Total 4 marks)

Edexcel Internal Review 11


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

11. A researcher thinks there is a link between a person's height and level of confidence. She
measured the height h, to the nearest cm, of a random sample of 9 people. She also devised a
test to measure the level of confidence c of each person. The data are shown in the table below.

h 179 169 187 166 162 193 161 177 168


c 569 561 579 561 540 598 542 565 573

[You may use ∑h2 = 272 094, ∑c2 = 2 878 966, ∑hc = 884 484]

(a) Draw a scatter diagram to illustrate these data.


(4)

(b) Find exact values of Shc Shh and Scc.


(4)

(c) Calculate the value of the product moment correlation coefficient for these data.
(3)

(d) Give an interpretation of your correlation coefficient.


(1)

(e) Calculate the equation of the regression line of c on h in the form c = a + bh.
(3)

(f) Estimate the level of confidence of a person of height 180 cm.


(2)

(g) State the range of values of h for which estimates of c are reliable.
(1)
(Total 18 marks)

Edexcel Internal Review 12


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

12. An office has the heating switched on at 7.00 a.m. each morning. On a particular day, the
temperature of the office, t °C, was recorded m minutes after 7.00 a.m. The results are shown in
the table below.

m 0 10 20 30 40 50
t 6.0 8.9 11.8 13.5 15.3 16.1

(a) Calculate the exact values of Smt and Smm.


(4)

(b) Calculate the equation of the regression line of t on m in the form t = a + bm.
(3)

(c) Use your equation to estimate the value of t at 7.35 a.m.


(2)

(d) State, giving a reason, whether or not you would use the regression equation in (b) to
estimate the temperature

(i) at 9.00 a.m. that day,

(ii) at 7.15 a.m. one month later.


(4)
(Total 13 marks)

13. A company wants to pay its employees according to their performance at work. The
performance score x and the annual salary, y in £100s, for a random sample of 10 of its
employees for last year were recorded. The results are shown in the table below.

x 15 40 27 39 27 15 20 30 19 24
y 216 384 234 399 226 132 175 316 187 196

[You may assume Σxy = 69 798, Σx2 = 7 266]

(a) Draw a scatter diagram to represent these data.


(4)

Edexcel Internal Review 13


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(b) Calculate exact values of Sxy and Sxx.


(4)

(c) (i) Calculate the equation of the regression line of y on x, in the form y = a + bx.

Give the values of a and b to 3 significant figures.

(ii) Draw this line on your scatter diagram.


(5)

(d) Interpret the gradient of the regression line.


(1)

The company decides to use this regression model to determine future salaries.

(e) Find the proposed annual salary for an employee who has a performance score of 35.
(2)
(Total 16 marks)

14. Eight students took tests in mathematics and physics. The marks for each student are given in
the table below where m represents the mathematics mark and p the physics mark.

Student

A B C D E F G H
Mark m 9 14 13 10 7 8 20 17
p 11 23 21 15 19 10 31 26

A science teacher believes that students’ marks in physics depend upon their mathematical
ability. The teacher decides to investigate this relationship using the test marks.

(a) Write down which is the explanatory variable in this investigation.


(1)

(b) Draw a scatter diagram to illustrate these data.


(3)

Edexcel Internal Review 14


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(c) Showing your working, find the equation of the regression line of p on m.
(8)

(d) Draw the regression line on your scatter diagram.


(2)

A ninth student was absent for the physics test, but she sat the mathematics test and scored 15.

(e) Using this model, estimate the mark she would have scored in the physics test.
(2)
(Total 16 marks)

15. The chief executive of Rex cars wants to investigate the relationship between the number of
new car sales and the amount of money spent on advertising. She collects data from company
records on the number of new car sales, c, and the cost of advertising each year, p (£000). The
data are shown in the table below.

Year Number of new car sales Cost of advertising (£000)


c p
1990 4240 120
1991 4380 126
1992 4420 132
1993 4440 134
1994 4430 137
1995 4520 144
1996 4590 148
1997 4660 150
1998 4700 153
1999 4790 158

Edexcel Internal Review 15


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(a) Using the coding x = (p – 100) and y = 1 (c – 4000), draw a scatter diagram to
10
represent these data. Explain why x is the explanatory variable.
(5)

(b) Find the equation of the least squares regression line of y on x.

[Use Σx = 402, Σy = 517, Σx2 = 17 538 and Σxy = 22 611.]


(7)

(c) Deduce the equation of the least squares regression line of c on p in the form c = a + bp.
(3)

(d) Interpret the value of a.


(2)

(e) Predict the number of extra new cars sales for an increase of £2000 in advertising budget.
Comment on the validity of your answer.
(2)
(Total 19 marks)

16. To test the heating of tyre material, tyres are run on a test rig at chosen speeds under given
conditions of load, pressure and surrounding temperature. The following table gives values of x,
the test rig speed in miles per hour (mph), and the temperature, y °C, generated in the shoulder
of the tyre for a particular tyre material.

x (mph) 15 20 25 30 35 40 45 50
y (°C) 53 55 63 65 78 83 91 101

(a) Draw a scatter diagram to represent these data.


(3)

(b) Give a reason to support the fitting of a regression line of the form y = a + bx
through these points.
(1)

(c) Find the values of a and b.


(4)

(You may use Σ x 2 = 9 500, Σy 2 = 45 483, Σxy = 20 615)

(d) Give an interpretation for each of a and b.


(2)

Edexcel Internal Review 16


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(e) Use your line to estimate the temperature at 50 mph and explain why this estimate
differs from the value given in the table.
(2)

A tyre specialist wants to estimate the temperature of this tyre material at 12 mph and 85 mph.

(f) Explain briefly whether or not you would recommend the specialist to use this regression
equation to obtain these estimates.
(4)
(Total 16 marks)

Edexcel Internal Review 17


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

1. (a) B1 B1 2
Note
1st B1 for at least 4 points correct (allow ± one 2mm square)
2nd B1 for all points correct (allow ± one 2 mm square

(b) The points lie reasonably close to a straight line (o.e.) B1 1


Note
Ignore extra points and lines
Require reference to points and line for B1.

=
(c) ∑ d 27.7,
= ∑ f 146 (both, may be implied) B1

Sdd
= 152.09 −
( 27.7 )
2
= 24.208….. awrt 24.2 M1 A1
6
27.7 ×146
S fd 723.1 −
= = 49.06…. awrt 49.1 A1 4
6
Note
M1 for a correct method seen for either – a correct expression
1st A1 for Sdd awrt 24.2
2nd A1 for Sfd awrt 49.1

Edexcel Internal Review 18


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

S fd
(d) b= = 2.026…. awrt 2.03 M1 A1
Sdd
146 27.7
a= −b× = 14.97….. so f = 15.0 + 2.03d M1 A1 4
6 6
Note
1st M1 for a correct expression for b – can follow through their
answers from (c)
2nd M1 for a correct method to find a – follow through their b
and their means
2nd A1 for f = .... in terms of d and all values awrt given expressions.
Accept 15 as rounding from correct answer only.

(e) A flight costs £2.03 (or about £2) for every extra 100km
or about 2p per km. B1ft 1
Note
Context of cost and distance required. Follow through their value of b

15.0
(f) 15.0 + 2.03d < 5d so d > = 5.00 ~ 5.05 M1
( 5 − 2.03)
So t > 500~505 A1 2
Note
M1 for an attempt to find the intersection of the 2 lines. Value of t in
range 500 to 505 seen award M1.
Value of d in range 5 to 5.05 award M1.
Accept t greater than 500 to 505 inclusive to include graphical
solution for M 1A1
[14]

833 2
2. (a) S pp = 106397 – = 7270 M1 A1
7
341 × 833
S pp = 42948 – = 2369,
7
3412 10986
S tt = 18181 – = 1569.42857.... or A1 A1 4
7 7
Note
M1 for at least one correct expression
1 A1 for Spp = 7270, 2nd A1 for Stp = 2369 or 2370,
st

3rd A1 for Stt = awrt 1570

Edexcel Internal Review 19


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

2369
(b) r= M1 A1ft
7270 × 1569.42857...
= 0.7013375 awrt (0.701) A1 3
Note
M1 for attempt at correct formula and at least one
correct value (or correct ft) M0 for
42948
106397 × 18181
A1ft All values correct or correct ft. Allow for
an answer of 0.7 or 0.70 Answer only: awrt
0.701 is 3/3, answer of 0.7 or 0.70 is 2/3

(c) (Pmcc shows positive correlation.)


Older patients have higher blood pressure B1 1
Note
B1 for comment in context that interprets the
fact that correlation is positive, as in scheme.
Must mention age and blood pressure in words,
not just “t” and “p”.

(d) Points plotted correctly on graph: –1 each error


or omission
(within one square of correct position) B2 2
Note
Record 1 point incorrect as B1B0 on epen. [NB overlay
for (60, 135) is slightly wrong]

2369
(e) b= = 1.509466... M1 A1
1569.42857...
833 341
a= – b× = 45.467413... M1
7 7
P = 45.5+1.51t A1 4
Note
1st M1 for use of the correct formula for b,
ft their values from (a)
1st A1 allow 1.5 or better
2nd M1 for use of y –b x with their values

2nd A1 for full equation with a = awrt 45.5 and


b = awrt 1.51. Must be p in terms of t,
not x and y.

Edexcel Internal Review 20


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(f) Line drawn with correct intercept, and gradient B1ft B1 2


Diagram for (d) + (f)

Note
1st B1ft ft their intercept (within one square).
You may have to extend their line.
2nd B1 for correct gradient i.e. parallel to given
line (Allow 1 square out when t = 80)

Edexcel Internal Review 21


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(g) t = 40, p = 105.84... from equation or graph. awrt 106 M1 A1 2


Note
M1 for clear use of their equation with
t = 40 or correct value from their graph.
A1 for awrt 106. Correct answer only (2/2)
otherwise look for evidence on graph to
award M1
[18]

59.99
3. (a) b= M1
33.381
= 1.79713….. 1.8 or awrt 1.80 A1
a = 32.7 – 1.79713…× 51.83 M1
= – 60.44525… awrt –60 A1
w = – 60.445251…+ 1.79713…l l and w required
and awrt 2sf A1ft 5
Note
Special case
59.99
b= = 0.4995 M0A0
120.1
a = 32.7 – 0.4995 × 51.83 M1A1
w = 6.8 + 0.50l at least 2 sf required
for A1

(b) w = – 60.445251…+ 1.79713…× 60 M1


= 47.3825… In range 47.3 – 47.6
inclusive A1 2
Note
Substitute into their answer for (a) for M1

(c) It is extrapolating so (may be) unreliable. B1 B1dep 2


Note
‘Outside the range on the table’ or equivalent
award first B1
[9]

Edexcel Internal Review 22


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(21.4) 2
4. (a) Sxx = 57.22 – = 11.424 M1
10
A1
21.4 × 96
Sxy = 313.7 – = 108.26 A1 3
10
Note
M1 for a correct expression
st
1 A1 for AWRT 11.4 for Sxx
2nd A1 for AWRT 108 for Sxy
Correct answers only: One value correct scores
M1 and appropriate A1, both correct M1A1A1

S xy
(b) b= = 9.4765... M1 A1
S xx
M1
a = y – b x = 9.6 – 2.14b = (–10.679...) M1 4
y = – 10.7 + 9.48x
Note
1st M1 for using their values in correct formula
1st A1 for AWRT 9.5
2nd M1 for correct method for a (minus sign required)
2nd A1 for equation with a and b AWRT 3 sf (e.g. y = –10.68 + 9.48x is fine)
Must have a full equation with a and b correct to awrt 3 sf

(c) Every (extra) hour spent using the programme produces about B1ft 1
9.5 marks improvement
Note
B1ft for comment conveying the idea of b marks per hour. Must
mention value of b but can ft their value of b. No need to mention
“extra” but must mention “marks” and “hour(s)” e.g. “…9.5
times per hour …” scores B0

(d) y = –10.7 + 9.48 × 3.3, = 20.6 awrt 21 M1, A1 2


Note
M1 for sub x = 3.3 into their regression equation from the
end of part (b)
A1 for awrt 21

Edexcel Internal Review 23


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(e) Model may not be valid since [8h is] outside the range [0.5 – 4]. B1 1
Note
B1 for a statement that says or implies that it may not be valid
because outside the range.
They do not have to mention the values concerned here namely
8 h or 0.5 – 4
[11]

401.3 2
5. (a) Stt = 10922.81 − = 186.6973 awrt 187 M1A1
15
25.08 2
Svv = 42.3356 − = 0.40184 awrt 0.402 A1
15
401.3 × 25.08
Stv = 677.971 – = 6.9974 awrt 7.00 A1 4
15
M1 any one attempt at a correct use of a formula.
Award full marks for correct answers with no working.
Epen order of awarding marks as above.

6.9974
(b) r= M1A1ft
186.6973 × 0.40184
= 0.807869 awrt 0.808 A1 3
M1 for correct formula and attempt to use
A1ft for their values from part (a)
677.971
NB Special Case for M1A0
10922.81× 42.3356
A1 awrt 0.808
Award 3 marks for awrt 0.808 with no working

(c) t is the explanatory variable as we B1


can control temperature but not frequency of noise or
equivalent comment B1 2
Marks are independent.
Second mark requires some interpretation in context and can
be statements such as ‘temperature effects / influences pitch or noise’
B1 ‘temperature is being changed’ BUT B0 for ‘temperature is changing’

(d) High value of r or r close to 1 or Strong correlation B1 1

Edexcel Internal Review 24


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

6.9974
(e) b= = 0.03748 awrt 0.0375 M1A1
186.6973
25.08 401.3
a= −b× = 0.6692874 awrt 0.669 M1A1 4
15 15
M1 their values the right way up
A1 for awrt 0.0375
M1 attempt to use correct formula with their value of b
A1 awrt 0.669

(f) t = 19, v = 0.6692874 + 0.03748 × 19 = 1.381406 awrt 1.4 B1 1


awrt 1.4
[15]

6. (a) ∑ x = ∑ t = 337.1, ∑ y =16.28 B1 B1


Can be implied

337.1×16.28
S xy = 757.467 − = 71.4685 M1 A1
8
either method, awrt 71.5

337.12
S xx = 15965.01 − = 1760.45875 A1 5
8
awrt 1760

71.4685
(b) b= = 0.04059652 M1 A1
1760.45875
÷ correct way up, awrt 0.0406

16.28 337.1
a= − b× = 0.324364 M1 A1
8 8
using correct formula, awrt 0.324
y = 0.324 + 0.0406x A1ft 5
3 sf or better but award for copying from above

(c) At t = 40, x = 40, y = 1.948, l = 2461.948 M1 A1 A1ft 3


sub x = 40, awrt 1.95, awrt 2461.95

(d) l – 2460 = 0.324 + 0.0406t M1


LHS required
l = 2460.324 + 0.0406t A1 2
awrt 2460.32 f.t. their 0.0406, l and t

Edexcel Internal Review 25


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(e) at t = 90, l = 2463.978 B1 1


awrt 2464

(f) 90 °C outside range of data B1


unlikely to be reliable B1 2
[18]

7. (a) Sensible graph scales, labels, shape B1, B1, B1 3


120

100
18, 96
15, 90
Evaporation loss (y ml)

16, 88
13, 82
80 12, 79

10, 69

60 8, 61

6, 53
5, 50

40
3, 36

20

0
0 2 4 6 8 10 12 14 16 18 20
Time (x weeks)

(b) Points lie close to a straight line B1 1

(c) Sxy = 8354 – 106 × 704 = 891.6 B1


10
2
Sxx = 1352 – 106 = 228.4 B1
10

b = 891.6 = 3.903677 ... awrt 3.9 M1 A1


228.4

a = 704 – b 106 = 29.021015 ... awrt 29 M1 A1


10 10
29.02, 3.90 A1ft 7

Edexcel Internal Review 26


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(d) For every extra week in storage, another 3.90 ml of chemical evaporates B1 1

(e) (i) 103.12


(ii) 165.52 B1 B1 2

(f) (i) Close to range of x , so reasonably reliable B1, B1

(ii) Well outside range of x, B1


could be unreliable since no evidence that model will
continue to hold B1 4
[18]

130 × 48
8. (a) Sxy = 8880 – = (8100) B1
8
may be implied
Sxx = 20487.5
S xy 81000
b= = = 0.395363… M1 A1
S xx 20487.5
Allow use of their Sxy for M
awrt 0.395
48 130
a= − (0.395363...) = –0.424649… M1 A1
8 8
allow use of their b for M
awrt –0.425
y = – 0.425 + 0.395x B1ft 6
3s.f.
Special case answer only B0 M0 B1 M0 B1 B1
(fully correct 3sf)
(≡ to B0 M0 A1 M0 A1 B1 on the epen)

(b) f – 100 = –0.424649… + 0.395… (m – 250) M1 a1ft


subst f – 100 & m – 250
f = 0.735 + 0.395m A1 3
3 s.f.

(c) m = 235 ⇒ f = 93.64489… B1 1


awrt 93.6/93.7
[10]

Edexcel Internal Review 27


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

9. (a)
90

80 182, 80
172, 77 184, 77
165, 72
70 175, 70
162, 65
Weight (y)

60 164, 59
156, 56
152, 52
50 155, 49
147, 44
40 148, 39

30
140 150 160 170 180 190
Height (x)
sensible scales B1
labels B1
shape B1 3

(b) Positive; as x increases, y increases B1;B1g 2


context OK

1962 × 740
(c) Sxy = 122783 – = 1793 M1A1 2
12
use of formula, cao
(1793 only M1A1)

S xy 1793
(d) b= = = 1.027507… M1A1 2
S xx 1745
division, 1.028
(SR 1.028 B1 only)

Edexcel Internal Review 28


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

740 2
(e) y= = 61 B1
12 3
2
61 or 61.6 or 61.7
3
2
47746  740 
s= −  = 13.26859 M1A1 3
12  12 
Use of formula including root, 13.3 or 13.9
(SR 13.3 or 13.9 B1 only)

(f) 34-36, 87-89 B1B1 2


strict limits,3sf or better
(g) All values between their 35.7 and their 87.7 so could be normal.
Reason required B1 1
[15]

S xy 3477.6
10. (a) b= = = 0.7900… B1
S xx 4402
awrt0.79
a = y − b x = 28.6 – (0.7900…) × 36 = 0.159836… B1
awrt 0.16
y = 0.16 + 0.79x B1ft 3
or equivalent

(b) OR just answer B1 ONLY


y = 0.16 + 0.79 × 45 = 35.71 awrt 35.7 B1 1
[4]

Edexcel Internal Review 29


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

11. (a)

600 ×

580 ×
×
×
×
560 × ×

540 ××

520

500

160 170 180 190 200

Labels (not x, y) B1
Sensible scales allow axis interchange B1
Points B2 4
(−1 ee)

1562× 5088
(b) Shc = 884484 − = 1433⅓ M1
9
correct use of S
1433⅓; 1433.3 A1
Shh = 1000 2 9 ; Scc = 2550 A1; A1 4
1000 2 9 , 1000.2 ; 2550
(NB: accept :- 9; i.e.:- 159 7 27 ; 111 1181 ; 283⅓)

Edexcel Internal Review 30


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

1433 13
(c) r= M1
1000 2 9 × 2550
substitution in correct formula
= 0.897488…. A1 ft A1 3
AWRT 0.897(accept 0.8975)

(d) Taller people tend to be more confident B1 1


context

1433.3
(e) b= = 1.433014….. M1
1000.2
5088 1433.3 1562
a= − × = 316.6256… M1
9 1000.2 9
allow use of their b
∴c = 317 + 1.43h (3sf) A1 3

(f) h = 180 ⇒ c = 574.4 or 574.5683…. M1


subt. of 180
574 – 575 A1 2

(g) 161 ≤ h ≤ 193 B1 1


[18]
NB (a) No graph paper ⇒ 0/4

12. (a) Σm = 150; Σm2 = 5500


Σt = 71.6; Σt2 = 930; Σmt = 2147 B1
5500 & 2147 seen
150 × 71.6
Smt = 2147 – = 357 M1 A1
6
357
Accept = 59.5
60
150 2
Smm = 5500 – = 1750 A1 4
6
Accept 291.6
No working shown SR: B1 B1 only

Edexcel Internal Review 31


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

357
(b) b= = 0.204 M1
1750
71.6 150 •
a= − 0.204 × = 6.8 3 M1
6 6
∴t = 6.83 + 0.204 m
No working seen SR: t = 6.83 + 0.204m B1 only A1 3
5
Accept 6.83 , 6.83, 6 %
6

(c) 7.35 ⇒ m = 35
∴t = 6.83 + 0.204 × 35 = 13.973 M1 A1 2
14.0 AWRT

(d) (i) 9.00 ⇒ m = 120


No; outside range of data (after 7.50 am) B1; B1

(ii) No; No evidence model will apply one month later B1; B1 4
[13]

Edexcel Internal Review 32


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

13. (a)

Salary (£00’s)
400

350

300

250

200

150

100

50

0 10 20 30 40 50
Performance score

Scales and labels B1


Accept x, y points B3 4
(–1e.e.)

256 × 2465
(b) Sxy = 69798 – = 6694
10
256, 2465 in (b) B1
Sxy or Sxx M1
2
256
Sxx = 7266 – = 712.4
10
6694 B1
712.4 B1
SR: No working ⇒ B0 M0 B1 B1 4

Edexcel Internal Review 33


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

6694
(c) (i) b= = 9.3964… M1 A1
712.4
(their Sxy and Sxx) AWRT
9.40
2465 6694 256
a= − × = 5.95199… M1
10 712.4 10
Using their values
∴y = 5.95 + 9.40x A1 ft
3.s.f.

(ii) Line on graph B1 5


By eye Not through origin. Accept broken scales

(d) Salary increases by £940 for every 1 point performance increase B1 ft 1


(e) x = 35 ⇒ y = 334.95 M1
Evidence – calculation or graph
Salary is £33,495 A1 2
33,000 – 34,000
[16]

Edexcel Internal Review 34


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

14. (a) m is explanatory variable B1 1

Þ
35

×
30

×
25

×
×
20
×

15 ×

×
10 ×

(b)

0
5 10 15 20 25 m
scales and labels B1
points B2
(6,7 points) B1 3
Line M1 A1

Edexcel Internal Review 35


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(c) Σm = 98; Σp = 156; Σm2 = 1348; Σmp = 2119


98 × 156
Smp = 2119 − = 208 M1 A1
8
98 2
Smm = 1348 − = 147.5 A1
8
S mp 208
∴b= = = 1.410169 (awrt 1.41) M1 A1
S mm 147.5
156 98
a= − (1.410169…) × = 2.225429 (awrt 2.23) M1 A1
8 8
∴ p = 2.23 + 1.41m A1 ft 8

(d) Line on graph M1 A1 2


(e) p = 2.23 + 1.41 × 15 = 23.38 M1 A1 2
[14]

15. (a)
x 20 26 32 34 37 44 48 50 53 58 B1
y 24 38 42 44 43 52 59 66 70 79

Change in cost of advertising influences number of new car sales B1


Graph: Scale and labels B1
Points all correct B2 5

402 × 517
(b) Sxy = 22611 – = 1827.6 M1 A1
10
402 2
Sxx = 17538 – = 1377.6 A1
10
S xy 1827.6
b= = = 1.326655… M1 A1
S xx 1377.6
517 402
a= – (1.326655…) × = –1.63153… B1
10 10
∴ y = –1.63 + 1.33x B1 ft 7

c – 4000
(c) = –1.63 + 1.33(p – 100) M1 A1 ft
10
c = 2653.7 + 13.3p A1 3

(d) No. sold if no money spent on advertising B1


p = 0 is well outside valid range – meaningless B1 2

Edexcel Internal Review 36


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

(e) 2 × 13.3 = 27 extra cars sold B1


Only valid in range of data for 1990s B1 2
[19]

16. (a)

y
100

90

80
Temp
°C

70

60

50

x
15 25 35 45 55

Speed (mph)
Scales & labels B1
Points B2, 1, 0 3

(b) Points lie reasonably close to a straight line B1 1

Edexcel Internal Review 37


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

8 × 20615 − 260 × 589 11780


(c) b= = = 1.40238…
8 × 9500 − (260) 2
8400
(accept awrt 1.40) M1 A1
589  260 
a= − (1.40238…)   = 28.0476175…
8  8 
(accept awrt 28.0) M1 A1 4
∴ y = 28.0 + 1.40 x

(d) a ⇒ surrounding air temperature when tyre is stationary B1


b ⇒ for every extra mph, temperature rises by 1.40 °C B1 2
(e) y = 28.0 + 1.40 × 50 = 98 B1
Regression line is only a line of best fit and does not necessarily pass
through all points B1 2
12 mph – reasonable to use line; 12 is just below lowest x-value B1; B1
85 mph – not reasonable to use line; 85 is well outside range
of values B1; B1 4
[16]

Edexcel Internal Review 38


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

1. The vast majority of candidates produced accurate scatter diagrams and on the rare occasion that
there was a point missing it was predominantly point D. Explaining exactly why a linear
regression model was appropriate proved to be difficult for candidates overall. Most candidates
seemed to have the general idea but did not express this in the required terms and consequently
very few earned this mark. Comments tended to be much more general about why linear
regression is carried out and most talked about correlation being high without explaining that
the points lie close to a line.
On the whole the correct formulae were used in calculations of Sdd and Sfd, with most candidates
earning the method mark at the very least. The same was true in the calculations of b and a
overall, although a common mistake was to calculate Sff and go onto use that in the calculation
of b. Premature approximation cost many candidates accuracy marks. Interpretations of the
value of b were considerably varied, with relatively few candidates gaining this mark and some
opted to omit this part altogether. Most candidates failed to relate their value to the context of
the question and often tended to discuss b merely in terms of being the gradient. As a
consequence, despite having the right kind of idea and correctly understanding the concept of
the gradient, frequently candidates failed to gain this mark due to missing out the relevant units,
mixing up the units or not quoting the actual value of b.
Very few candidates were able to formulate the correct equation with the correct units in part
(f), and the majority found this particularly challenging, either omitting this part or resorting to
evaluating the lines at the data points rather than equating and solving the equations. Often no
clear strategy was apparent and a common mistake was to equate their equation to 5. There was
clearly confusion over t and d and even out of those who were able to solve the required
equation or inequality, not many found the value of t or range of t in km, as most tended to give
their answer in terms of d. Occasionally the intersection point was evaluated using their graph
after the lines had been plotted.

2. This was a high scoring question for most candidates. The calculations in parts (a) and (b) were
answered very well with very few failing to use the formulae correctly. Part (c) received a good
number of correct responses but many still failed to interpret their value and simply described
the correlation as strongly positive. The scatter diagram was usually plotted correctly and most
knew how to calculate the equation of the regression line although some used S pp instead of S tt
and some gave their final equation in terms of y and x instead of p and t . Plotting the line in part
(f) proved quite challenging for many candidates and a number with the correct equation did not
have the gradient correct. Part (g) was usually well done but some chose to use their graph
rather than their equation of the line and lost the final accuracy mark.

3. There were some good responses to this question, but some candidates calculated the slope as
59.99/120.1, although 3/5 marks were obtained if they went on to produce the equation as w =
6.8 + 0.50l provided a minimum of 2 significant figures were used. Candidates should be able to
identify the independent and dependent variables from a contextual question. The accuracy
mark for the calculation of the intercept was lost if they used the rounded value of 1.8 for the
slope in the calculation for the intercept. Many candidates did not believe 60mm to be quite far
enough away from the data range to be called extrapolation showing that they did not go back
and read the question carefully enough and consider the range of values given.

Edexcel Internal Review 39


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

4. This proved to be a straightforward starter for most candidates who were able to tackle part (a)
confidently, usually scoring full marks. Part (b) was answered well too; the correct formulae
were selected and answers were usually given to 3 sf or better. Some candidates lost the final
mark here for failing to give the full equation. Part (c) though was not answered well. There
were plenty of comments about the gradient being positive or there being positive correlation or
even skewness. Few realised that the instruction to “interpret” wanted an answer in context and
comments conveying the idea that every extra hour spent on the programme yields an extra 9.5
marks were rare. Part (d) was straightforward again but some did not use their regression
equation to find the estimate but rather tried to interpolate between the values of 3 and 3.5 given
in the table. Part (e) had a mixed response. Many good candidates rejected Lee’s comment on
the basis that 8 hours was outside the range of the data and they secured the mark. Other, less
successful, candidates simply calculated the value and then agreed with Lee or they rejected his
claim on some other basis such as the difficulty of revising for 8 hours or 60 marks might take
him above the total score on the paper.

5. This was done well by all but the weakest students with most using sufficient accuracy to score
highly. Many candidates demonstrated an understanding of the use of the formulae to achieve
full marks in part (a) and part (b). By far the main reason for loss of marks was premature
approximation. Part (c) and part (d) were done well by good candidates. Only the more able
candidates had a correct reason why t was the explanatory variable. Many called v the
explanatory variable but gave a correct reason for t. The written parts were not universally done
correctly, although the ability of students to deal with this topic has improved considerably in
recent examinations. Rounding once again caused issues in part (e), but usually did not have an
effect on part (f).

6. In part (a) calculating Σl instead of the required Σy was the most common reason for losing
marks. In part (b) premature approximation was frequent and caused a loss of marks in other
parts of the question. In part (c) substituting t=40 was usually attempted but some then
neglected to add on the 2460. Candidates are now very well primed to say that a certain value is
out of range and hence the result is not reliable.

7. Graphs were well done and candidates are finally labelling axes, but poor choice of scale for the
x-axis meant some struggled to plot the graph accurately. For a standard question part (b) was
disappointing with many answers referring to correlation but not to a straight line or line of best
fit. Part (c) was generally well answered with the inevitable loss of the last mark through lack of
accuracy by using 3.9 or not reading the question for the 2 decimal places required for the
answers. A significant minority also thought that b represented the product moment correlation
coefficient. Responses to part (d) usually missed the context of the question and in part (f) the
proximity to the range of values of x was often omitted.

Edexcel Internal Review 40


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

8. Candidates were well prepared for this question. The major problems arose as a result of
rounding. The most surprising was rounding to 1 significant figure! This came up a great deal
too frequently. It should be established now that there is a need to keep values for a and b
un-rounded when ‘decoding’ the line but to express answers to 3 significant figures in the final
stages.

9. Most candidates can plot and interpret scatter diagrams and use the formulae given in the
formula book. A significant number of candidates still cannot correctly calculate the standard
deviation to the required accuracy. A significant minority worked out the standard deviation of
the x-values by mistake and of those who worked out the correct standard deviation, many used
a premature approximation of the mean of 61.7 losing the accuracy mark

10. Most candidates were able to score well on this question. The values of both b and a were
usually found accurately with most candidates giving the equation of the regression line of y on
x to the required degree of accuracy. The value of y when x = 45 rarely caused any problems.

11. This question was familiar to most candidates and many of them answered it very well. This
being said, too many used scales that were not sensible for the scatter diagram and far too many
ignored the instruction to ‘find the exact value’. The interpretation of the correlation coefficient
was rarely given in terms of the context of the question and many candidates did not give the
values of a and b to 3 significant figures in spite of previous advice.

12. Apart form arithmetic errors the first three parts of this question were well answered and many
candidates gained most of the marks. It was good to see that many more of the regression
equations were calculated with coefficients given to 3 significant figures. In the final part of the
question, whilst there were many good solutions, some candidates did not state whether or not
they would use the equation and others did not appreciate the context of the question.

13. Although the data in this question did not lend itself to easily chosen scales, most candidates did
manage to produce a reasonable scatter diagram and eventually they were able to draw their
regression line on it. Most candidates answered parts (b) and (c) well although some of them did
not give their values of a and b to 3 significant figures as stated in the question. Whilst many
candidates knew what was required in parts (d) and (e) they were unable to handle the units.

Edexcel Internal Review 41


S1 Correlation and regression – Regression PhysicsAndMathsTutor.com

14. Many candidates appeared not to have sufficient time to complete this question. Not all
candidates recognised the explanatory variable, leading some of them to find the wrong
regression line. Apart from the use of silly scales the scatter diagram was often correctly drawn
with many candidates going on to find correct values for the regression coefficients. Accuracy
was much better handled in this question than in similar questions on previous papers. Too
many candidates gave their final answer in terms of x and y rather than m and p.

15. Overall candidates responded well to this question. They knew how to work out the values of a
and b in part (b) but their accuracy often let them down. They did not work to a sufficient
degree of accuracy and a value of –1.77 was often seen instead of –1.63.Scatter diagrams were
often correctly drawn but the scales used by many candidates were often not sensible. The back
substitution in part (c) and the prediction in part (e) was beyond many of the candidates.

16. No Report available for this question.

Edexcel Internal Review 42

You might also like