11 Correlation and The Correlation Coefficient Notes
11 Correlation and The Correlation Coefficient Notes
GRAPHICAL PRESENTATION:
Correlation is an indication of how the two variables are related. We can measure the degree of correlation
between THE two variables by calculating a correlation coefficient.
There are various types of scatter diagramS which represent different degrees of correlation:
1. Positive correlation. When a scatter diagram shows an upward sloping line, it means that as the value of
one variable increases, the value of the other variable tends to increase as well.
Note: When there is no exact relationship between the points plotted on a scatter diagram and we can’t join all of
the points up to make a straight line, we can draw a line of best fit through the points that we have plotted by
estimating the general direction of the data.
2. Negative correlation. When a scatter diagram shows a downward sloping line, it means that as the value
of one variable increases, the value of the other one decreases.
Perfect negative correlation Partial negative correlation
3. No correlation. This is when we can’t see any possible relationship between the points plotted.
No correlation
STATISTICAL TECHNIQUE:
The correlation coefficient (r) measures the degree of correlation between variables and is calculated using
the following formula:
𝑛𝛴𝑥𝑦 − 𝛴𝑥𝛴𝑦
𝑟=
√(𝑛𝛴𝑥 2 − (𝛴𝑥)2 )(𝑛𝛴𝑦 2 − (𝛴𝑦)2 )
When r =1, there is perfect positive correlation and the variables are strongly related to each other;
r = -1, data is perfectly negatively correlated;
0 < r < 1, there is partial positive correlation;
-1 < r < 0, two variables have partial negative correlation;
r = 0, there is no correlation.
The coefficient of determination is a measure of how much the variation in one variable can be explained by the
variation in another variable. The coefficient of determination is simply the correlation coefficient (r) squared.
For example, when the coefficient of determination is equal to 1, it means that 100 percent of the variations in
the dependent variable can be explained by the variation in the independent variable.
EXAMPLE:
The following information relates to the number of electricity units used and the total electricity charge of a
business:
Solution:
𝛴𝑥 = 460
𝛴𝑦 = 55.00
𝛴𝑥𝑦 = 6,390
𝛴𝑥2 = 54,200
𝛴𝑦2 = 759.50
4 𝑥 6,390 − 460 𝑥 55
𝑟=
√(4 𝑥 54,200 − (460)2 ) (4 𝑥 759.50 − (55)2 )
𝑟=1