4.2 Correlation & Regression
4.2 Correlation & Regression
Page 1 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Examiner Tip
If you use scatter diagrams in your Internal Assessment then be aware that nding outliers for
bivariate data is di erent to nding outliers for univariate data
(x, y) could be an outlier for the bivariate data even if x and y are not outliers for their separate
univariate data
Page 2 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Correlation
What is correlation? Your notes
Correlation is how the two variables change in relation to each other
Correlation could be the result of a causal relationship but this is not always the case
Linear correlation is when the changes are proportional to each other
Perfect linear correlation means that the bivariate data will all lie on a straight line on a scatter diagram
When describing correlation mention
The type of the correlation
Positive correlation is when an increase in one variable results in the other variable increasing
Negative correlation is when an increase in one variable results in the other variable
decreasing
No linear correlation is when the data points don’t appear to follow a trend
The strength of the correlation
Strong linear correlation is when the data points lie close to a straight line
Weak linear correlation is when the data points are not close to a straight line
If there is strong linear correlation you can draw a line of best t (by eye)
The line of best t will pass through the mean point (x , y )
⎯⎯ ⎯⎯
Page 3 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Page 4 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Worked example
Your notes
A teacher is interested in the relationship between the number of hours her students spend on a phone
per day and the number of hours they spend on a computer. She takes a sample of nine students and
records the results in the table below.
Hours spent on a
7.6 7.0 8.9 3.0 3.0 7.5 2.1 1.3 5.8
phone per day
Hours spent on a
1.7 1.1 0.7 5.8 5.2 1.7 6.9 7.1 3.3
computer per day
Page 5 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Your notes
Page 6 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Page 7 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
The regression line can be used to decide what type of correlation there is if there is no scatter diagram
If the gradient is positive then the data set has positive correlation
If the gradient is negative then the data set has negative correlation Your notes
The regression line can also be used to predict the value of a dependent variable from an independent
variable
The equation for the y on x line should only be used to make predictions for y
Using a y on x line to predict x is not always reliable
The equation for the x on y line should only be used to make predictions for x
Using an x on y line to predict y is not always reliable
Making a prediction within the range of the given data is called interpolation
This is usually reliable
The stronger the correlation the more reliable the prediction
Making a prediction outside of the range of the given data is called extrapolation
This is much less reliable
The prediction will be more reliable if the number of data values in the original sample set is bigger
The y on x and x on y regression lines intersect at the mean point (x , y )
⎯⎯ ⎯⎯
Examiner Tip
Once you calculate the values of a and b store then in your GDC
This means you can use the full display values rather than the rounded values when using the
linear regression equation to predict values
This avoids rounding errors
Page 8 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Worked example
Your notes
The table below shows the scores of eight students for a maths test and an English test.
Maths (x ) 7 18 37 52 61 68 75 82
English (y ) 5 3 9 12 17 41 49 97
b) Write down the equation of the regression line of y on x , giving your answer in the form
y = ax + b where a and b are constants to be found.
c) Write down the equation of the regression line of x on y , giving your answer in the form
x = cy + d where c and d are constants to be found.
d) Use the appropriate regression line to predict the score on the maths test of a student who got
a score of 63 on the English test.
Page 9 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
Your notes
Page 10 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
PMCC
What is Pearson’s product-moment correlation coe cient? Your notes
Pearson’s product-moment correlation coe cient (PMCC) is a way of giving a numerical value to a
linear relationship of bivariate data
The PMCC of a sample is denoted by the letter r
r can take any value such that −1 ≤ r ≤ 1
A positive value of r describes positive correlation
A negative value of r describes negative correlation
r = 0 means there is no linear correlation
r = 1 means perfect positive linear correlation
r = -1 means perfect negative linear correlation
The closer to 1 or -1 the stronger the correlation
Page 11 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers
Head to www.savemyexams.com for more awesome resources
S xy
r= Your notes
SxSy
n ⎛⎜ n ⎞⎟ ⎛⎜ n ⎞⎟
1
S xy = ∑x i y i − ⎜⎜ ∑x ⎟⎟ ⎜⎜ ∑y ⎟⎟ is linked to the covariance
i =1 n i i
⎝i =1 ⎠ ⎝i =1 ⎠
n
1 ⎛⎜ n ⎞⎟2 n
1 ⎛⎜ n ⎞⎟2
Sx = ∑x i 2 − n ⎜⎜ ∑x i ⎟⎟ and S y = ∑y i 2 − n ⎜⎜ ∑y i ⎟⎟ are linked to the
i =1 ⎝i =1 ⎠ i =1 ⎝i =1 ⎠
variances
You do not need to learn this as using your GDC will be expected
When does the PMCC suggest there is a linear relationship?
Critical values of r indicate when the PMCC would suggest there is a linear relationship
In your exam you will be given critical values where appropriate
Critical values will depend on the size of the sample
If the absolute value of the PMCC is bigger than the critical value then this suggests a linear model is
appropriate
Page 12 of 12
© 2015−2024 Save My Exams, Ltd. · Revision Notes, Topic Questions, Past Papers