Linear correlation and linear regression
Linear correlation and linear regression
linear regression
By
Md. Siddikur Rahman, PhD
Associate Professor
Department of Statistics
Begum Rokeya University, Rangpur.
Scatter Plots and Correlation
A scatter plot (or scatter diagram)
is used to show the relationship
between two variables.
Correlation analysis is used to
measure strength of the association
(linear relationship) between two
variables.
Scatter Plot Examples
Linear relationships Curvilinear relationships
y y
x x
y y
x x
Scatter Plot Examples
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Scatter Plot Examples
(continued)
No relationship
x
Correlation Coefficient
Correlation measures the strength of the linear
association between two variables
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3 r=0
Calculating the Correlation Coefficient
( x − x )( y − y )
i =1
i i
r=
cov ariance ( x, y )
= n −1
n n
var x var y
i
( x
i =1
− x ) 2
i
( y
i =1
− y ) 2
n −1 n − 1 Numera
( x − x )( y − y ) =
tor of
SP( x, y ) covarian
= ce
[ ( x − x ) ][ ( y − y ) ]
2 2
SS ( x) SS ( y )
Numerator of
variance
Calculating the Correlation Coefficient
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Calculation Example
Tree Trunk
Height Diameter
• y • x xy y2 x2
• 35 • 8 280 1225 64
• 49 • 9 441 2401 81
• 27 • 7 189 729 49
• 33 • 6 198 1089 36
• 60 • 13 780 3600 169
• 21 • 7 147 441 49
• 45 • 11 495 2025 121
• 51 • 12 612 2601 144
=321 =73 =3142 =14111 =713
Calculation Example
(continued)
Tree
n xy − x y
Height, r=
y70 [n( x 2 ) − ( x) 2 ][n( y 2 ) − ( y) 2 ]
60
8(3142) − (73)(321)
50 =
40
[8(713) − (73) 2 ][8(14111) − (321) 2 ]
= 0.886
30
20
10
0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Excel Output
Excel Correlation Output
Tools / data analysis / correlation…
Manually CORRL(array1,array2)
Correlation between
Tree Height and Trunk Diameter
Business Statistics: A Decision-
Making Approach, 7e © 2008
Prentice-Hall, Inc. Chap 14-13
Introduction to
Regression Analysis
Regression analysis is used to:
◦ Predict the value of a dependent variable based on the
value of at least one independent variable
◦ Explain the impact of changes in an independent variable
on the dependent variable
Dependent variable: the variable we wish to explain
Independent variable: the variable used to explain
the dependent variable
Simple Linear Regression Model
y = β0 + β1x + ε
Variable
y y = β0 + β1x + ε
Observed Value
of y for xi
εi Slope = β1
Predicted Value Random Error for
of y for xi
this x value
Intercept = β0
xi x
Business Statistics: A Decision-
Making Approach, 7e © 2008
Prentice-Hall, Inc. Chap 14-18
Simple Linear Regression Equation
(Prediction Line)
The simple linear regression equation provides an estimate of the
population regression line
Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept
Value of X for
observation i
Ŷi = 0 + 1 X i
The individual random error terms ei have a mean of zero
Department of Statistics, ITS
Surabaya Slide-19
Interpretation of the
Slope and the Intercept
0is the estimated average value of y
when the value of x is zero
1 is the estimated change in the average
value of y as a result of a one-unit change
in x
Example
The following data was collected in a study of
age and fatness in humans.
Age 23 23 27 27 39 41 45 49 50
% Fat 9.5 27.9 7.8 17.8 31.4 25.9 27.4 25.2 31.1
Age 53 53 54 56 57 58 58 60 61
% Fat 34.7 42 29.1 32.5 30.3 33 33.8 41.1 34.5
X = 834
45 27.4 2025 1233
49 25.2 2401 1234.8
y = 515
50 31.1 2500 1555
53 34.7 2809 1839.1
53 42 2809 2226
X = 41612
2
54
56
29.1
32.5
2916 1571.4
3136 1820
XY = 25489.2 57
58
30.3
33
3249 1727.1
3364 1914
58 33.8 3364 1960.4
60 41.1 3600 2466
61 34.5 3721 2104.5
834 Copyright
515 © 2005 41612
Brooks/Cole,25489.2
a division
22 of Thomson Learning, Inc.
Example
n = 18, x = 834, y = 515
= 41612,
x 2
xy = 25489.2
( x)
2
S xx = x 2
−
n
8342
= 41612 − = 2970
18
Sxy = xy −
( x )( y )
n
= 25489.2 −
( 834 )( 515 )
= 1627.53
Copyright © 2005 Brooks/Cole, a division
1823 of Thomson Learning, Inc.
Example
S xy 1627.53
1 = b = = = 0.54799
S xx 2970
515 834
0 = a = y − bx = − 0.54799 = 3.2209
18 18
ŷ = 3.22 + 0.548x
If we want to predict average %Fat for 45(say) year
old humans( ? )
ŷ = 3.22 + 0.548x =3.22+0.548*(45)=27.9
Simple Linear Regression Example
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
350 Slope
300
250
= 0.10977
200
150
100
50
Intercept
= 98.248 0
0 500 1000 1500 2000 2500 3000
Square Feet
SSR
R =2 where 0 R2 1
SST
Coefficient of Determination, R2
(continued)
Coefficient of determination
SSR sum of squares explained by regression
R =
2
=
SST total sum of squares
R =r 2 2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
Examples of Approximate
R2 Values
y
R2 = 1
x
R2 = +1
Examples of Approximate
R2 Values
(continued)
y
0 < R2 < 1
x
Examples of Approximate
R2 Values
(continued)
R2 = 0
y
No linear relationship
between x and y: