Chapter-9-Simple Linear Regression & Correlation
Chapter-9-Simple Linear Regression & Correlation
CHAPTER 9
9. SIMPLE LINEAR REGRESSION AND CORRELATION
Linear regression and correlation is studying and measuring the linear relation ship
among two or more variables. When only two variables are involved, the analysis is
referred to as simple correlation and simple linear regression analysis, and when there are
more than two variables the term multiple regression and partial correlation is used.
Correlation Analysis: deals with the measurement of the closeness of the relation ship
which are described in the regression equation.
We say there is correlation if the two series of items vary together directly or inversely.
The presence of correlation between two variables may be due to three reasons:
1.One variable being the cause of the other. The cause is called “subject” or
“independent” variable, while the effect is called “dependent” variable.
2.Both variables being the result of a common cause. That is, the correlation
that exists between two variables is due to their being related to some third
force.
Page 1
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation
Example:
Let X1= ESLCE result
Y1= rate of surviving in the University
Y2= the rate of getting a scholar ship.
Both X1&Y1 and X1&Y2 have high positive correlation, likewiseY1 & Y2 have
positive correlation but they are not directly related, but they are related to each
other via X1.
Examples:
Price of teff in Addis Ababa and grade of students in USA.
Weight of individuals in Ethiopia and income of individuals in Kenya.
r
( X i X )(Yi Y ) and the short cut formula is
( X i X ) (Yi Y )
2 2
n XY ( X )( Y )
r
[n X 2 ( X ) 2 ] [n Y 2 ( Y ) 2
r
XY nXY
[ X 2 nX 2 ] [ Y 2 nY 2 ]
Remark: Always this r lies between -1 and 1 inclusively and it is also symmetric.
Interpretation of r
1.Perfect positive linear relationship ( if r 1)
2.Some Positive linear relationship ( if r is between 0 and 1)
3.No linear relationship ( if r 0)
4.Some Negative linear relationship ( if r is between -1 and 0)
5.Perfect negative linear relationship ( if r 1)
Examples:
1. Calculate the simple correlation between mid semester and final exam scores of 10
students (both out of 50)
Page 2
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation
r
XY nXY
[ X 2 nX 2 ] [ Y 2 nY 2 ]
10331 10(31.2)(32.9)
(9920 10(973.4)) (11003 10(1082.4))
66.2
0.363
182.5
This means mid semester exam and final exam scores have a slightly positive correlation.
Exercise The following data were collected from a certain household on the monthly
income (X) and consumption (Y) for the past 10 months. Compute the simple correlation
coefficient.
X: 650 654 720 456 536 853 735 650 536 666
Y: 450 523 235 398 500 632 500 635 450 360
The above formula and procedure is only applicable on quantitative data, but when
we have qualitative data like efficiency, honesty, intelligence, etc we calculate what is
called Spearman’s rank correlation coefficient as follows:
Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula
Page 3
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation
6 Di
2
rs 1
n(n 2 1)
Where rs coefficient of rank correlatio n
D the difference between paired ranks
n the number of pairs
Example:
Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is
correlation between the tests of the ladies.
Lipstick types A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7
Solution:
D2
X Y R1-R2
6 Di
2
6(12)
(R1) (R2) (D) rs 1 1 0.786
2 1 1 1 n(n 2 1) 7(48)
1 3 -2 4
4 2 2 4
3 4 -1 1
5 5 0 0 Yes, there is positive correlation.
7 6 1 1
6 7 -1 1
Total 12
Page 4
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation
b
( X i X )(Yi Y ) XY nXY
( X i X )2 X 2 nX 2
a Y bX
Example 1: The following data shows the score of 12 students for Accounting and Statistics
examinations.
Page 5
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation
Accounting Statistics
X2 Y2 XY
X Y
1 74.00 81.00 5476.00 6561.00 5994.00
2 93.00 86.00 8649.00 7396.00 7998.00
3 55.00 67.00 3025.00 4489.00 3685.00
4 41.00 35.00 1681.00 1225.00 1435.00
5 23.00 30.00 529.00 900.00 690.00
6 92.00 100.00 8464.00 10000.00 9200.00
7 64.00 55.00 4096.00 3025.00 3520.00
8 40.00 52.00 1600.00 2704.00 2080.00
9 71.00 76.00 5041.00 5776.00 5396.00
10 33.00 24.00 1089.00 576.00 792.00
11 30.00 48.00 900.00 2304.00 1440.00
12 71.00 87.00 5041.00 7569.00 6177.00
Total 687.00 741.00 45591.00 52525.00 48407.00
Mean 57.25 61.75
a)
The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two
variables are positively correlated (Y increases as X increases).
b)
where:
Page 6
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation
Yˆ 7.0194 0.9560 X
7.0194 0.9560(85) 88.28
Exercise: A car rental agency is interested in studying the relationship between the
distance driven in kilometer (Y) and the maintenance cost for their cars (X in birr). The
following summarized information is given based on samples of size 5.
2
i 1 X i 147,000,000 i 1Yi 314
5 5 2
- To know how far the regression equation has been able to explain the variation in Y we
2
use a measure called coefficient of determination ( r )
(Yˆ Y ) 2
i.e r 2
(Y Y ) 2
Where r the simple correlatio n coefficient.
2
- r gives the proportion of the variation in Y explained by the regression of Y on X.
- 1 r gives the unexplained proportion and is called coefficient of indetermination.
2
SX Y
( X i X )(Yi Y ) XY nXY
n 1 n 1
Page 7
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation
Xˆ a1 b1Y
b1
XY nXY
Y 2 nY 2
b1SY
a1 X b1Y , r
SX
Here X is dependent and Y is independent.
Page 8
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation
Solution
We will assume one of the equation as regression of X on Y and the other as Y on X
and calculate r
15 3 9
r 2 bYX * bXY 0,1
4 20 16
Page 9