0% found this document useful (0 votes)

72 views

Chapter 23 Correlation and Linear Regression Tutorial Solutions With Comments

This document discusses linear regression and correlation. It provides examples of using scatter plots and calculating correlation coefficients to determine if a linear model is appropriate for relationships between two variables. It also demonstrates finding constants and using linear models to estimate values.

Uploaded by

NABIH FIKRI NURHUDA 22S502

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views

Chapter 23 Correlation and Linear Regression Tutorial Solutions With Comments

Uploaded by

NABIH FIKRI NURHUDA 22S502

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Chapter 23 Correlation and Linear Regression TMJC 2022

H2 Mathematics (9758)
Chapter 23 Correlation and Linear Regression
Tutorial Questions
Tutorial Questions

1 9758 Specimen Paper/II/6

Giant pumpkins are often irregular in shape. In order to account for the different shapes
of pumpkins, growers of giant pumpkins measure the size of a pumpkin by a combination
of three measurements, called the ‘over the top’ length. Pumpkin growers keep records
so that they can estimate the mass of giant pumpkins while they are still growing. The
over the top lengths (d m) and the masses (m kg) of a random sample of 7 giant pumpkins
are as follows.

d 2.31 2.9 4.05 5.5 6.7 7.92 9.17

m 11 14 47 104 170 282 449

(i) Draw a scatter diagram of these data, and explain how you know from your diagram
that the relationship between m and d should not be modelled by an equation of the
form y = ax + b . [2]

(ii) Which of the formulae m = ed 2 + f and m = gd 3 + h , where e, f, g and h are

constants, is the better model for the relationship between m and d? Explain fully
how you decided, and find the constants for the better formula. [5]

(iii) Use the formula you chose from part (ii) to estimate the mass of a giant pumpkin
with
(a) over the top length 6m,
(b) over the top length 12 m.
Explain which of your two estimates is more reliable. [3]

Q1 Solution
(i) m
When plotting a scatter diagram,
449 you must:
• Label the axes,
• Indicate min and max values on
each of the axes,
• Show the relative positions of
the points clearly,
11 • Check that all points are drawn.
d
2.31 9.17
From the diagram, as d increases, m increases at an increasing rate. Therefore, linear
equation of the form y = ax + b is not appropriate.

From a scatter diagram, a linear model is appropriate when

• Points lie close to a straight line in the scatter diagram
• The trend showing that as d increases, m increases at a constant rate

Page 1 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

(ii) For m = ed 2 + f , r = 0.98889.

For m = gd 3 + h, r = 0.99951.

Input into List with

L3 = ( L1 ) & L4 = ( L1 )
2 3

Input with
Xlist: L3 (which is d 2 )
Ylist: L 2 (which is m)

Xlist: L3 (which is d 2 ) Input with

Ylist: L 2 (which is m) Xlist: L 4 (which is d 3 )
Ylist: L 2 (which is m)

m = gd 3 + h is the better model since |r|= 0.99951 is closer to 1.

g = 0.57165  0.572 ( 3 s.f.)

h = 3.7431  3.74 ( 3 s.f.)

(iii) From GC,

(a) mass = 127 (3s.f.)
(b) mass = 992 (3s.f.)

For (b), d = 12 is outside of the data range of d and thus the linear relationship might
no longer hold. Since d = 6 lies within the data range of d and r = 0.99951 is close to
1, the estimate for (a) is more reliable.
An estimate is reliable when
• r is close to 1
• Interpolation (the value that we substitute in is within the data range)

An estimate is not reliable when

• Extrapolation (the value that we substitute in is outside of the data range so the
linear relation between the 2 variables may no longer hold)

Page 2 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

2 N2008/II/8
A certain metal discolours when exposed to air. To protect the metal against discolouring,
it is treated with a chemical. In an experiment, different quantities, x ml, of the chemical
were applied to standard samples of the metal, and the times, t hours, for the metal to
discolour were measured. The results are given in the table.
x 1.2 2.0 2.7 3.8 4.8 5.6 6.9
t 2.2 4.5 5.8 7.3 7.6 9.0 9.9
(i) Calculate the product moment correlation coefficient between x and t, and explain
whether your answer suggests that a linear model is appropriate. [3]
(ii) Draw a scatter diagram for the data. [1]
One of the values of t appears to be incorrect.
(iii) Indicate the corresponding point on your diagram by labeling it P, and explain why
the scatter diagram for the remaining points may be consistent with a model of the
form t = a + b ln x . [2]
(iv) Omitting P, calculate least squares estimates of a and b for the model t = a + b ln x .
[2]
(v) Assume that the value of x at P is correct. Estimate the value of t for this value of
x. [1]
(vi) Comment on the use of the model in part (iv) in predicting the value of t when
x = 8.0 . [1]

2 Solution
(i) Using the GC, the product moment correlation coefficient, r  0.970 (3s.f.). Since
value of r is close to 1 which suggests a strong positive linear correlation, a linear
model seems appropriate.

(ii) t When plotting a scatter diagram,

you must:
9.9 • Label the axes,
• Indicate min and max values on
P(4.8, 7.6) each of the axes,
• Show the relative positions of
the points clearly,
2.2
x • Check that all points are drawn.
1.2 6.9
(iii) With the point P removed as x increases, the values of t increases but at decreasing
rate. Hence it is consistent with a model of the form t = a + b ln x .

Page 3 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

(iv) t = 1.4247 + 4.3966(ln x)

Therefore a = 1.42 , b = 4.40

Delete P(4.8, 7.6)

from the list first then
input with
L3 = ln ( L1 )

(v) When x = 4.8 , t = 1.4247 + 4.3966 ( ln 4.8 ) = 8.32 (3 s.f.)

(vi) Since x = 8.0 is outside the data range of x, linear relation between t and ln x may no
longer hold. Thus the predicted value of t is unreliable.

An estimate is reliable when

• r is close to 1
• Interpolation (the value that we substitute in is within the data range)

An estimate is not reliable when

• Extrapolation (the value that we substitute in is outside of the data range so the
linear relation between the 2 variables may no longer hold)

Page 4 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

3 N2009/II/6
The table gives the world record time, in seconds above 3 minutes 30 seconds, for running
1 mile as at 1st January in various years.
Year, x 1930 1940 1950 1960 1970 1980 1990 2000
Time, t 40.4 36.4 31.3 24.5 21.1 19.0 16.3 13.1
(i) Draw a scatter diagram to illustrate the data. [2]
(ii) Comment on whether a linear model would be appropriate, referring both to the
scatter diagram and the context of the question. [2]
(iii) Explain why in this context a quadratic model would probably not be appropriate
for long-term predictions. [1]
(iv) Fit a model of the form ln t = a + bx to the data, and use it to predict the world
record time as at 1st January 2010. Comment on the reliability of your prediction.
[3]
3 Solution
(i) t When plotting a scatter diagram,
40.4 you must:
• Label the axes,
• Indicate min and max values on
each of the axes,
• Show the relative positions of
13.1 the points clearly,
x
• Check that all points are drawn.
1930 2000

(ii) From the data gathered from 1930 to 2000, a linear model is appropriate from the
scatter diagram since most of the data points lie close to a straight line.
However, in the context of the question, a linear model may not be appropriate in the
long run since human capacity will at some point in time reach a plateau.

(iii) A quadratic model is not suitable for long-term predictions, as a quadratic model with
a minimum turning point means that there will be a point in time where the value of t
(record time) increases as x (years) increases. But t (record time) can only decrease or
maintain the same as years go by.

Note: A quadratic curve based on the current

shape of the scatter diagram would mean
that we are expecting a minimum turning
point along the way and the data points
will exhibit an increasing trend after the
turning point. However, in this context,
we are recording world record time, so it
is impossible for us to record a time that
is longer than the previous years.

Page 5 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

(iv) Using GC, ln t = 34.853 − 0.016128x  ln t = 34.9 − 0.0161x (3s.f)

When x = 2010, ln t = 34.853 − 0.016128 ( 2010 )

ln t = 2.4359
t = 11.4 ( 3 s.f.)
Thus, the world record time is 3 minutes 41.4 seconds.

Since x = 2010 falls outside the data range of x, linear relation between ln t and x may
no longer hold. Thus the prediction is not reliable.
An estimate is reliable when
• r is close to 1
• Interpolation (the value that we substitute in is within the data range)

An estimate is not reliable when

• Extrapolation (the value that we substitute in is outside of the data range so the
linear relation between the 2 variables may no longer hold)

Page 6 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

4 N2015/II/10
In an experiment the following information was gathered about air pressure P, measured
inches of mercury, at different heights above sea-level h, measured in feet.
h 2000 5000 10 000 15 000 20 000 25 000 30 000 35 000 40 000 45 000
P 27.8 24.9 20.6 16.9 13.8 11.1 8.89 7.04 5.52 4.28
(i) Draw a scatter diagram for these values, labelling the axes. [1]
(ii) Find, correct to 4 decimal places, the product moment correlation coefficient
between
(a) h and P,
(b) ln h and P,
(c) h and P. [3]
(iii) Using the most appropriate case from part (ii), find the equation which best models
air pressure at different heights. [3]
(iv) Given that 1 metre = 3.28 feet, re-write your equation from part (iii) so that it can
be used to estimate the air pressure when the height is given in metres. [2]
4 Solution
(i) P
27.8 x When plotting a scatter diagram,
you must:
x • Label the axes,
x • Indicate min and max values on
x each of the axes,
x
x • Show the relative positions of
x x
4.28 x x the points clearly,
h • Check that all points are drawn.
2000 45000
(ii)(a) r = −0.980731  −0.9807 (correct to 4 d.p)
(ii)(b) r = −0.974800  −0.9748 (correct to 4 d.p)

Page 7 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

(ii)(c) r = −0.998637  −0.9986 (correct to 4 d.p)

(iii) Since for part (c), r = −0.9986 is closest to −1 , and as h increases, P decreases at a
decreasing rate, part (c) is the most appropriate model.

Give 2 reasons when choosing the most appropriate model:

• Describe the trend of data points based on scatter diagram
• |r| value closest to 1 (or r value closest to -1 in this case since all are negative).

Sea level h is the independent variable,

 P = 34.789 − 0.14687 h hence we should find the regression line
of P on h .
P = 34.8 − 0.147 h (correct to 3 s.f)
Hence, in GC,
Xlist (independent): L 4 ( h)
(iv) P = 34.789 − 0.14687 h (h is in feet) Ylist (dependent): L 2 ( P )

Identify the relationship between the units.

Since 1 metre = 3.28 feet, When given sea level in x meters, we have to multiply
x metre  3.28x feet x by 3.28 to change the units to feet. Hence, we replace
P = 34.789 − 0.14687 3.28 x h by 3.28x so that we are working with the same units
as the equation we have found in (iii).
P = 34.789 − 0.26599 x
Change of variable from x to h since sea
level is denoted by h in the question.

P = 34.789 − 0.26599 h (h is in metres)

P = 34.8 − 0.266 h (correct to 3 s.f) (h is in metres)
Otherwise method: Convert all data values to meters and recompute. You will get
the same answer.

Page 8 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

5 N2012/II/8
Amy is revising for a mathematics examination and takes a different practice paper each
week. Her marks, y% in week x, are as follows.
Week x 1 2 3 4 5 6
Percentage mark y 38 63 67 75 71 82
(i) Draw a scatter diagram showing these marks. [1]
(ii) Suggest a possible reason why one of the marks does not seem to follow the trend.
[1]
(iii) It is desired to predict Amy’s marks on future papers. Explain why, in this context,
neither a linear nor a quadratic model is likely to be appropriate. [2]

It is decided to fit a model of the form ln ( L − y ) = a + bx , where L is a suitable constant.

The product moment correlation coefficient between x and ln ( L − y ) is denoted by r.
The following table gives the values of r for some possible values of L.
L 91 92 93
r −0.929944 −0.929918
(iv) Calculate the value of r for L = 91, giving your answer correct to 6 decimal places.
[1]
(v) Use the table and your answer to part (iv) to suggest with a reason which of 91, 92
or 93 is the most appropriate value for L. [1]
(vi) Using this value of L, calculate the values of a and b, and use them to predict the
week in which Amy will obtain her first mark of at least 90%. [4]
(vii) Give an interpretation, in context, of the value of L. [1]

5 Solution
(i) y When plotting a scatter diagram,
you must:
82
• Label the axes,
• Indicate min and max values on
each of the axes,
• Show the relative positions of
38 the points clearly,
x • Check that all points are drawn.
1 6
(ii) The irregularity occurred in Week 1. That practice paper may be more difficult than
the other papers.
Note: Another possible reason could be that Amy was not prepared academicaîly
for the practice paper in Week 1.
(iii) The marks cannot exceed 100%, and so a linear model, which models an infinite
upward/downward trend of data, is not appropriate.
The marks are likely to plateau off or stay constant as the weeks go by, rather than
in the case of a quadratic model which is expected to fit data with an increase and
then a decrease (or the other way round) trend. Thus, a quadratic model is also not
appropriate.
(iv) For L = 91, r = −0.929744 (6 decimal places)

Page 9 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

(v) Since r = 0.929944 is closest to 1 for L = 92, this is the most appropriate value for
L.
Concept: r measures the strength of linear relationship.

(vi) ln ( 92 − y ) = a + bx Xlist for variable x

From GC, a = 4.10, b = −0.280 Ylist for variable ln ( 92 − y )

3s.f. for final answer

ln ( 92 − y ) = 4.1045 − 0.27960 x 5 s.f. for intermediate working
Thus, ln ( 92 − 90 ) = 4.1045 − 0.27960 x
 x = 12.2
Amy will get at least 90% in Week 13.

Remember to answer the question. Note that since the number of marks
increases over the week, we should round up to 13 weeks.

(vii) L is the percentage mark she gets if she continues practising indefinitely.
ln ( L − y ) = a + bx
y = L − ea +bx
Since b  0 , ea +bx → 0 as x → 0
y → L

Page 10 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

6 N2013/II/10
(i) Sketch a scatter diagram that might be expected when x and y are related
approximately as given in each of the cases (A), (B) and (C) below. In each case
your diagram should include 6 points, approximately equally spaced with respect
to x , and with all x- and y- values positive. The letters a, b, c, d, e and f represent
constants.
(A) y = a + bx 2 , where a is positive and b is negative,
(B) y = c + d ln x , where c is positive and d is negative,
f
(C) y = e + , where e is positive and f is negative. [3]
x
A motoring website gives the following information about the distance travelled, y km,
by a certain type of car at different speeds, x km h −1 , on a fixed amount of fuel.
Speed, x 88 96 104 112 120 128
Distance, y 148 147 144 138 126 107
(ii) Draw the scatter diagram for these values, labelling the axes. [1]
(iii) Explain which of the three cases in part (i) is the most appropriate for modelling
these values, and calculate the product moment correlation coefficient for this case.
[2]
(iv) It is required to estimate the distance travelled at a speed of 110 km h −1 . Use the
case that you identified in part (iii) to find the equation of a suitable regression line,
and use your equation to find the required estimate. [3]
6 Solution
(i)
Ensure that there are:
• 6 points
• Equally spaced with respect to x
• Positive x and y values

(A) y = a + bx 2 , where a is positive and b is negative

When unsure, use the GC to try and

sketch a graph that has the required
shape. One example is y = 1 − x 2

We are only interested in first quadrant

for positive x and y values.

Page 11 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

(B) y = c + d ln x , where c is positive and d is negative

When unsure, use the GC to try and

sketch a graph that has the required
shape. One example is y = 5 − 2ln x

We are only interested in first quadrant

for positive x and y values.

f
(C) y = e + , where e is positive and f is negative
x
When unsure, use the GC to try and
sketch a graph that has the required
2
shape. One example is y = 5 −
x

We are only interested in first quadrant

for positive x and y values.

(ii) y
148 When plotting a scatter diagram,
you must:
• Label the axes,
• Indicate min and max values on
each of the axes,
• Show the relative positions of
107 x the points clearly,
88 128 • Check that all points are drawn.

(iii) From the scatter diagram, as x increases, y decreases at an increasing rate, thus
model (A) y = a + bx 2 is the most appropriate model.
r = −0.939
Describe the trend of data points from the scatter diagram. Since all three
models have different general shapes, choose the one that matches the scatter
diagram the most without the need to compare their correlation coefficient.

Page 12 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

(iv) Using GC,

There is no controlled variable in this
y = 189.75 − 0.0046198 x 2 context. Since we are given the value of x, we
 y = 190 − 0.00462 x 2 (to 3 s.f.) should use the regression line of y on x 2 .

when x = 110, y = 189.75 − 0.0046198 (110 )

= 134 (to 3 s.f.)

Page 13 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

7 N2016/II/8
A website about electric motors gives information about the percentage efficiency of
motors depending on their power, measured in horsepower. Xian has copied the following
table for a particular type of electric motor, but he has copied one of the efficiency values
wrongly.

Power, x 1 1.5 2 3 5 7.5 10 20 30 40 50

Efficiency, y% 72.5 82.5 84.0 87.4 87.5 88.5 89.5 90.2 91.0 91.7 92.4

(i) Plot a scatter diagram on graph paper for these values, labelling the axes, using a
scale of 2 cm to represent 10% efficiency on the y-axis and an appropriate scale for
the x-axis. On your diagram, circle the point that Xian has copied wrongly. [2]
For parts (ii), (iii) and (iv) of this question you should exclude the point for which Xian
has copied the efficiency value wrongly.
(ii) Explain from your scatter diagram why the relationship between x and y should not
be modelled by an equation of the form y = ax + b . [1]
(iii) Suppose that the relationship between x and y is modelled by an equation of the
c
form y = + d , where c and d are constants. State with a reason whether each of c
x
and d is positive or negative. [2]
(iv) Find the product moment correlation coefficient and the constants c and d for the
model in part (iii). [3]
c
(v) Use the model y = + d , with the values of c and d found in part (iv), to estimate
x
the efficiency value (y) that Xian has copied wrongly. Give two reasons why you
would expect this estimate to be reliable. [3]

7 Solution
(i) y When plotting a scatter diagram,
92.4 you must:
• Label the axes,
• Indicate min and max values on
each of the axes,
This point does
not follow the • Show the relative positions of
the points clearly,
curvilinear trend
72.5 • Check that all points are drawn.
x
1 50
(ii) As x increases, y increases at a decreasing rate. Therefore y = ax + b is not a suitable
model.
(iii) Since x increases as y increases, c is negative. You can try keying in +ve &
Since efficiency is non-negative, d is positive. -ve values of c & d to check

Page 14 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

(iv)

From GC, • Remove the outlier before

r = −0.980 ( 3 s.f .) keying in x into L1 & y into
L2
c = −17.5 ( 3 s.f .) • Key in L3 = 1/L2
d = 91.8 ( 3 s.f .)
(v) From GC, the estimated value = 85.9 ( 3 s.f.)
Since r = −0.980 is close to −1 , which indicates a strong negative linear correlation
1
between y and and x = 3 is within the data range, the estimate is reliable.
x
An estimate is reliable when
• r is close to 1
• Interpolation (the value that we substitute in is within the data range)

An estimate is not reliable when

• Extrapolation (the value that we substitute in is outside of the data range so the
linear relation between the 2 variables may no longer hold)

Page 15 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

8 N2017/II/8
(a) Draw separate scatter diagrams, each with 8 points, all in the first quadrant, which
represent the situation where the product moment correlation coefficient between
variables x and y is
(i) −1 ,
(ii) 0 ,
(iii) between 0.5 and 0.9. [3]

(b) An investigation into the effect of a fertiliser on yields of corn found that the amount
of fertiliser applied, x, resulted in the average yields of corn, y, given below, where
x and y are measured in suitable units.
x 0 40 80 120 160 200
y 70 104 118 119 126 129
(i) Draw a scatter diagram for these values. State which of the following
equations, where a and b are positive constants, provides the most accurate
model of the relationship between x and y.
a
(A) y = ax 2 + b (B) y= +b
x2
(C) y = a ln 2 x + b (D) y = a x +b [2]
(ii) Using the model you chose in part (i), write down the equation for the
relationship between x and y, giving the numerical values of the coefficients.
State the product moment correlation coefficient for this model. [3]
(iii) Give two reasons why it would be reasonable to use your model to estimate
the value of y when x = 189. [2]

8 Solution
a
(i)
Do not draw a regression line on
your scatter diagram. The points
should look obviously collinear.

a
(ii)

Page 16 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

a
(iii)

b
(i)

Keeping in mind a &

b are positive. Model
D is the most
accurate one.

Model (D) provides the most accurate model of relationship between x and y.
b
(ii)

y = 4.18211387 x + 74.04787
= 4.18 x + 74.0 (3 s.f.)
r = 0.981 (3 s.f.)
b Since x = 189 is within the data range of x and the value of r = 0.981 is close to 1,
(iii) implying a strong positive linear correlation, the linear correlation between x and y
holds. Thus it is reasonable to use model (D) to estimate the value of y when x =189.
An estimate is reliable when
• r is close to 1
• Interpolation (the value that we substitute in is within the data range)

An estimate is not reliable when

• Extrapolation (the value that we substitute in is outside of the data range so the
linear relation between the 2 variables may no longer hold)
Page 17 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

9 2011 MJC/II/11
A random sample of nine pairs of values of x and y are given in the table.

x 2.5 2 3 3.5 5 4 5.3 7.5 6

y 3.20 3.40 3.0 2.86 2.61 2.75 2.57 k 2.55
(i) The equation of the regression line of y on x is y = −0.175 x + 3.57.
Show that k = 2.4. [3]
(ii) Draw a scatter diagram for this set of data and obtain the product moment
correlation coefficient. Comment on the suitability of the linear model. [4]
(iii) Determine which of the following models is more appropriate:
A. ln y = a + bx
b
B. y =a+
x
where a and b are constants. [2]
(iv) It is required to estimate the value of y for which x = 8. Find the equation of a
suitable regression line, and use it to find the required estimate. Comment on the
reliability of your estimation. [3]
9 Solution
(i) 2.5 + 2 + 3 + 3.5 + 5 + 4 + 5.3 + 7.5 + 6 38.8
x= =
9 9
3.2 + 3.4 + 3 + 2.86 + 2.61 + 2.75 + 2.57 + k + 2.55 22.94 + k
y= =
9 9
y = −0.175 x + 3.57
y = −0.175 x + 3.57
The only point you can be
22.94 + k  38.8 
= −0.175   + 3.57 certain is on the regression
9  9  line is ( x, y )
1267
22.94 + k =
50
 k = 2.4 ( shown )
(ii) y
When plotting a scatter diagram,
3.40 you must:
• Label the axes,
• Indicate min and max values on
each of the axes,
• Show the relative positions of
2.4 the points clearly,
x • Check that all points are drawn.
2.0 7.5
Using GC: r = −0.943 ( 3 s.f.)
Even though r is close to – 1, from the scatter diagram, as x increases, y decreases at
a decreasing rate. Therefore, a linear model may not be suitable.
(iii) For Model A, r = −0.958 ( 3 s.f.)

Page 18 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

For Model B, r = 0.997 ( 3 s.f.)

Since r is closer to 1 for Model B, it is more appropriate.
(iv)

Equation of regression line for model B is

2.75
y= + 2.07
x
When x = 8,
2.7480
y= + 2.0651 = 2.41 ( 3 s.f.)
8
Although r = 0.997 suggests a strong positive correlation, x = 8 falls outside the
data range of x and therefore, the estimation of y is unreliable as the linear relation
1
between y and may no longer hold
x

Page 19 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

10 A car is travelling along a stretch of road with speed v km/h when the brakes are applied.
The car comes to rest after travelling a further distance of s metre. The values of s for 8
different values of v are given in the table, correct to 2 decimal places.

v 25 30 35 40 45 50 55 60
s 2.83 4.63 4.84 5.29 9.73 10.30 14.82 15.21

(i) Calculate the product moment correlation coefficient between v and s . What
does this indicate about the scatter diagram of the points (v, s )?
(ii) It is given that the product moment correlation coefficient between v and s is 0.965,
correct to 3 decimal places. State why the regression line of s on v is more
suitable than the regression line of s on v, and find the equation of the regression
line of s on v.

(iii) Consider the equation of the regression line of s on v. In the context of the
question,
(a) comment on the value of the constant term,

(b) interpret the slope of the regression line.

(iv) Would you be willing to use this model to predict the further distance travelled if
the speed is 70 km/h? Explain your answer with reason(s).

10 Solution
(i) r = 0.97496 = 0.975 (3 s.f.)

This r = 0.975 indicates that most of the points lie close to a straight line.
(ii) Since 0.975 is closer to 1 compared with 0.965, the regression line of s on v is more
suitable than the regression line of s on v.

Equation of regression line of s on v is

s = 0.066336v – 0.017748
i.e. s = 0.0663v – 0.0177 (3 s.f.)

(iii)(a) When v = 0, s = – 0.0177 ≠ 0 suggests that there is an error in the data. it is unrealistic
to use the model for v = 0 (or close to 0) as the value of s will be imaginary.
In fact, we should not use the regression line to model beyond the range of v.
OR
The value of the constant term represents the distance the car travel if its speed is 0, so
the constant term should be 0. The value of – 0.0177 is close to 0, which shows the
model is quite accurate for the range of v given. (Again, we should not use the model for
values of v outside the range, let alone for v close to 0)
(iii)(b) For each 1 km/h increase to the speed, the square root of the distance travelled will
increase by 0.0663m1/2.
(iv) No. 70 km/h lies outside the given data range of v and therefore, the estimation of s is
unreliable as the linear relation between s and v may no longer hold

Page 20 of 21
Chapter 23 Correlation and Linear Regression TMJC 2022

11 The table shows the number y (in millions) of cell-phone subscribers in a country from
2001 to 2010, where t represents number of years from 2000.

t 1 2 3 4 5 6 7 8 9 10
y 1.6 2.7 4.4 6.4 8.9 13.1 19.3 28.2 38.2 48.7

The relationship between y and t is given by the formula y = abt , where a and b are
constants.
(i) Using the substitution I = ln y , show that the relation between I and t is linear.
(ii) Find the equation of the estimated regression line of I on t and hence give estimates
for a and b.
(iii) Find the product moment correlation coefficient between I and t.
(iv) Predict the number of cell-phone subscribers in the year 2015. Comment on the
reliability of your prediction.
(v) It is required to estimate the value of t for which I = 1.5. Explain which of the
regression lines I on t or t on I, should be used. Use the equation of your choice to
find the value of t when I = 1.5.
11 Solution
(i) Apply ln to both sides to show linearization:
ln ( y ) = ln ( abt )
ln y = ln a + ln ( bt )
ln y = ln a + t ln b
(ii) From GC, I = 0.377423t + 0.26295183
Thus, comparing with ln y = t ln b + ln a
ln a = 0.26295183  a = e0.26295183 = 1.300764 = 1.30(to 3 sf)
ln b = 0.377423  b = e0.377423 = 1.4582 = 1.46(to 3 sf)
(iii) r = 0.996741=0.997 (to 3 sf)
(iv) When t =15,
I = 0.377423(15) + 0.26295183
I = 5.92429683
y = 374.0153
 374 millions
Since t =15 falls outside the data range of t , the prediction of y is unreliable as the linear
relation between ln y and t may no longer hold

(v) Since t is the independent variable, use line I on t.

When I = 1.5  y = 4.48
1.5 = 0.377423t + 0.26295183
t = 3.2776 = 3.28(to 3 sf)

Page 21 of 21

2009 - Introductory Time Series With R - Select Solutions - Aug 05
33% (3)
2009 - Introductory Time Series With R - Select Solutions - Aug 05
16 pages
S1ED3
No ratings yet
S1ED3
25 pages
Chapter 3 Unit Test
No ratings yet
Chapter 3 Unit Test
6 pages
Chapter 23 Correlation and Linear Regression Lecture Notes
No ratings yet
Chapter 23 Correlation and Linear Regression Lecture Notes
23 pages
C R Assgt Solns v5
No ratings yet
C R Assgt Solns v5
6 pages
Correlation and Regression Skill Set
No ratings yet
Correlation and Regression Skill Set
8 pages
Chapter 03 Linear Regression Solutions
No ratings yet
Chapter 03 Linear Regression Solutions
6 pages
Workbook.regression.solutions
No ratings yet
Workbook.regression.solutions
52 pages
BNH modelling
No ratings yet
BNH modelling
6 pages
Lab06.least Squares Fitting Shortened - Desmos - MATH-1173-001 - Calculus I With Computer Expl
No ratings yet
Lab06.least Squares Fitting Shortened - Desmos - MATH-1173-001 - Calculus I With Computer Expl
5 pages
Unit 4 Statistics Notes Scatter Plot 2023-24
No ratings yet
Unit 4 Statistics Notes Scatter Plot 2023-24
15 pages
12-S6 Correlation and Regression
No ratings yet
12-S6 Correlation and Regression
30 pages
Topic_13_Correlation_and_Simple_Linear_Regression
No ratings yet
Topic_13_Correlation_and_Simple_Linear_Regression
17 pages
Statistics Correlation Analysis
No ratings yet
Statistics Correlation Analysis
10 pages
Math 133 - Unit 7 Graphing Data-1
No ratings yet
Math 133 - Unit 7 Graphing Data-1
20 pages
What Is Empirical - Models
No ratings yet
What Is Empirical - Models
14 pages
IA1 Checklist
No ratings yet
IA1 Checklist
3 pages
BIVARIATE DATA NOTES
No ratings yet
BIVARIATE DATA NOTES
14 pages
MDM4U Unit1 CorrelationSE
No ratings yet
MDM4U Unit1 CorrelationSE
3 pages
Due: Monday September 17: Homework 2 - Solution ECE 445 Biomedical Instrumentation, Fall 2012
No ratings yet
Due: Monday September 17: Homework 2 - Solution ECE 445 Biomedical Instrumentation, Fall 2012
3 pages
Correlation and Regression
No ratings yet
Correlation and Regression
61 pages
Notes Scatter Plots
No ratings yet
Notes Scatter Plots
39 pages
Questions Stats and Trix
No ratings yet
Questions Stats and Trix
39 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Example: Anscombe's Quartet Revisited: CC-BY-SA-3.0 GFDL
No ratings yet
Example: Anscombe's Quartet Revisited: CC-BY-SA-3.0 GFDL
10 pages
An Introduction To Linear Correlation - IBDP Mathematics - Applications and Interpretation SL FE2021 - Kognity
No ratings yet
An Introduction To Linear Correlation - IBDP Mathematics - Applications and Interpretation SL FE2021 - Kognity
6 pages
Mathematics Grade 12 Term 3 Week 3_2020
No ratings yet
Mathematics Grade 12 Term 3 Week 3_2020
5 pages
LinearRegression Correlation
No ratings yet
LinearRegression Correlation
3 pages
MAT 120 Chapter 9 Notes PDF
No ratings yet
MAT 120 Chapter 9 Notes PDF
4 pages
Group 12 - MicroEconomics - EL
No ratings yet
Group 12 - MicroEconomics - EL
2 pages
Notes - Correlation & Regression
No ratings yet
Notes - Correlation & Regression
34 pages
Chapter12 Stats
No ratings yet
Chapter12 Stats
6 pages
CORRELATION-AND-REGRESSION (1)
No ratings yet
CORRELATION-AND-REGRESSION (1)
23 pages
Pearson's Correlation Coefficient
No ratings yet
Pearson's Correlation Coefficient
7 pages
Statistics 2 For Chemical Engineering: Department of Mathematics and Computer Science
No ratings yet
Statistics 2 For Chemical Engineering: Department of Mathematics and Computer Science
37 pages
Stats
No ratings yet
Stats
16 pages
Chapter 3 Notes-Alyssa
No ratings yet
Chapter 3 Notes-Alyssa
10 pages
Chapter 3 Notes-Alyssa
No ratings yet
Chapter 3 Notes-Alyssa
10 pages
5 Corellation & Regression
No ratings yet
5 Corellation & Regression
12 pages
Statistical Techniques - Formatted
No ratings yet
Statistical Techniques - Formatted
51 pages
HW1_solution_Fall2024
No ratings yet
HW1_solution_Fall2024
11 pages
Empirical Models: Data Collection
No ratings yet
Empirical Models: Data Collection
16 pages
Alg1 Pe 04 05 Best Fit
No ratings yet
Alg1 Pe 04 05 Best Fit
8 pages
MBA 8040 MODEL BUILDING With Data Transformations PDF
No ratings yet
MBA 8040 MODEL BUILDING With Data Transformations PDF
17 pages
Covariance and Correlation
No ratings yet
Covariance and Correlation
6 pages
C R Assgt Solns v5
No ratings yet
C R Assgt Solns v5
5 pages
4) S - Correlation
No ratings yet
4) S - Correlation
17 pages
Statistics Learners' Working Manual
No ratings yet
Statistics Learners' Working Manual
25 pages
Chapter_10.QM sir pac
No ratings yet
Chapter_10.QM sir pac
8 pages
Alg 2.2 2.6 Originals
No ratings yet
Alg 2.2 2.6 Originals
20 pages
Topic 6
No ratings yet
Topic 6
22 pages
Fds Unit FINAL
No ratings yet
Fds Unit FINAL
27 pages
Chapter 3 Assignment
100% (1)
Chapter 3 Assignment
5 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
Unit 3 Assignment DIRECTIONS R spr18
No ratings yet
Unit 3 Assignment DIRECTIONS R spr18
28 pages

Chapter 23 Correlation and Linear Regression Tutorial Solutions With Comments

Uploaded by

Chapter 23 Correlation and Linear Regression Tutorial Solutions With Comments

Uploaded by

Chapter 23 Correlation and Linear Regression TMJC 2022

1 9758 Specimen Paper/II/6

d 2.31 2.9 4.05 5.5 6.7 7.92 9.17

(ii) Which of the formulae m = ed 2 + f and m = gd 3 + h , where e, f, g and h are

From a scatter diagram, a linear model is appropriate when

(ii) For m = ed 2 + f , r = 0.98889.

Input into List with

Xlist: L3 (which is d 2 ) Input with

m = gd 3 + h is the better model since |r|= 0.99951 is closer to 1.

g = 0.57165  0.572 ( 3 s.f.)

(iii) From GC,

An estimate is not reliable when

(ii) t When plotting a scatter diagram,

(iv) t = 1.4247 + 4.3966(ln x)

Delete P(4.8, 7.6)

(v) When x = 4.8 , t = 1.4247 + 4.3966 ( ln 4.8 ) = 8.32 (3 s.f.)

An estimate is reliable when

An estimate is not reliable when

Note: A quadratic curve based on the current

(iv) Using GC, ln t = 34.853 − 0.016128x  ln t = 34.9 − 0.0161x (3s.f)

When x = 2010, ln t = 34.853 − 0.016128 ( 2010 )

An estimate is not reliable when

(ii)(c) r = −0.998637  −0.9986 (correct to 4 d.p)

Give 2 reasons when choosing the most appropriate model:

Sea level h is the independent variable,

Identify the relationship between the units.

P = 34.789 − 0.26599 h (h is in metres)

It is decided to fit a model of the form ln ( L − y ) = a + bx , where L is a suitable constant.

(vi) ln ( 92 − y ) = a + bx Xlist for variable x

3s.f. for final answer

(A) y = a + bx 2 , where a is positive and b is negative

When unsure, use the GC to try and

We are only interested in first quadrant

(B) y = c + d ln x , where c is positive and d is negative

When unsure, use the GC to try and

We are only interested in first quadrant

We are only interested in first quadrant

(iv) Using GC,

when x = 110, y = 189.75 − 0.0046198 (110 )

= 134 (to 3 s.f.)

Power, x 1 1.5 2 3 5 7.5 10 20 30 40 50

From GC, • Remove the outlier before

An estimate is not reliable when

Keeping in mind a &

An estimate is not reliable when

x 2.5 2 3 3.5 5 4 5.3 7.5 6

For Model B, r = 0.997 ( 3 s.f.)

Equation of regression line for model B is

(b) interpret the slope of the regression line.

Equation of regression line of s on v is

(v) Since t is the independent variable, use line I on t.

You might also like