Ch9- Correlation Regression
Ch9- Correlation Regression
QBM
Chapter 9
Correlation
and
Regression
Two continuous
variables
• Correlation and regression are concerned
with the investigation of two continuous
variables.
4
• Some initial insight into the relationship between two
continuous variables can be obtained by plotting a scatter
diagram and simply looking at the resulting graph.
5
• Is the association between the two variables strong
enough to be useful?
6
• The 'goodness of fit' can be calculated
to see how well the line fits the data.
• Once defined by an equation, the
relationship can be used for
predictive purposes.
7
Example
'Ice cream Sales' for a particular firm of
manufacturers and 'Average Monthly Temperature'
are:
Month Av. Temp Sales From this data we need:
°(C) (£'000)
• Scatter diagram
January 4 73
February 4 57 • Correlation coefficient
March 7 81 • Regression line
April 8 94
May 12 110
• Goodness of fit
June 15 124 • Prediction
July 16 134
August 17 139
September 14 124
October 11 103
November 7 81
December 5 80
Scatter diagrams
We look for a linear relationship with the
bivariate points plotted being reasonably
close to the, yet unknown, 'line of best fit'.
• Plot the independent Sales against Av erage Monthly Temperature
variable, x, on the
horizontal axis. 140
130
Sales
100
vertical, y, axis. 90
80
• (Minitab output shown) 70
60
• Looks promising: a 50
10
• Calculation of Correlation coefficient
• Input data to calculator
• Best to use of calculator in 'Type A+BX ' or 'LR
mode' as will be demonstrated in tutorials.
• (Method in specific calculator manual)
• (If without 'A+BX type' or 'LR mode' complex formulae and
methods are needed, also in textbook or handout.)
• Correlation coefficient, r, (output from calculator):
r = 0.9833
11
Is this correlation coefficient, 0.9833, significant?
Hypothesis test for a Pearson’s correlation coefficient
• H0: There is no association between ice-cream sales and
average monthly temperature.
• H1: There is an association between them.
• Critical Value:
• Χ2 tables, 5%, 10 degrees of freedom = 0.576
• Test statistic: 0.983
• Conclusion: The test statistic exceeds the critical value
so we reject the Null Hypothesis, H0, and conclude that
there is a significant association between ice-cream sales
and average monthly temperature.
12
Regression equation (y = a + bx)
• There is a significant relationship between the two
variables, so the next step is to define it as a
regression equation.
13
• The regression line is described, in general, as
the straight line of ‘best fit’ with the equation:
• y = a + bx
• where x and y are the independent and dependent
variables, a the intercept on the y-axis, and b the
slope of the line.
• For this data are: a = 45.5 b = 5.45
• Giving the regression equation:
• y = 45.5 + 5.45x
14
Draw this line on the scatter diagram:
regression equation:
120
110
Sales
100
E.g. If x = 15; 90
80
y = 45.5 + 5.45x15=127.2 70
60
50
5 10 15
corresponding value of y
can be found directly from
the calculator [ŷ].
Goodness of Fit
• How well does this line fit the data?
16
Prediction of Sales
19
In this lecture we have concentrated
Next lecture:
Summary