0% found this document useful (0 votes)

88 views

Chapter-9-Simple Linear Regression & Correlation

1. Simple linear regression and correlation analyze the linear relationship between two variables. Regression develops a mathematical equation showing how the variables are related, while correlation measures the closeness of their relationship. 2. Correlation can be positive (higher X associates with higher Y), negative (higher X associates with lower Y), or no correlation. The correlation coefficient r ranges from -1 to 1. 3. Correlation may exist because one variable causes the other, because variables are affected by a common cause, or by chance. The presence of correlation does not necessarily indicate one variable causes the other.

Uploaded by

Simeon solomon

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views

Chapter-9-Simple Linear Regression & Correlation

Uploaded by

Simeon solomon

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Lecture notes on Introduction to Statistics IX: simple linear regression and correlation

CHAPTER 9
9. SIMPLE LINEAR REGRESSION AND CORRELATION
Linear regression and correlation is studying and measuring the linear relation ship
among two or more variables. When only two variables are involved, the analysis is
referred to as simple correlation and simple linear regression analysis, and when there are
more than two variables the term multiple regression and partial correlation is used.

Regression Analysis: is a statistical technique that can be used to develop a

mathematical equation showing how variables are related.

Correlation Analysis: deals with the measurement of the closeness of the relation ship
which are described in the regression equation.
We say there is correlation if the two series of items vary together directly or inversely.

Simple Correlation: Suppose we have two variables X  ( X 1 , X 2 ,... X n ) and

Y  (Y1 , Y2 ,...Yn )
 When higher values of X are associated with higher values of Y and lower values
of X are associated with lower values of Y, then the correlation is said to be
positive or direct.
Examples:
- Income and expenditure
- Number of hours spent in studying and the score obtained
- Height and weight
- Distance covered and fuel consumed by car.
 When higher values of X are associated with lower values of Y and lower values
of X are associated with higher values of Y, then the correlation is said to be
negative or inverse.
Examples:
- Demand and supply
- Income and the proportion of income spent on food.
The correlation between X and Y may be one of the following
1. Perfect positive (slope=1)
2. Positive (slope between 0 and 1)
3. No correlation (slope=0)
4. Negative (slope between -1 and 0)
5. Perfect negative (slope=-1)

The presence of correlation between two variables may be due to three reasons:
1.One variable being the cause of the other. The cause is called “subject” or
“independent” variable, while the effect is called “dependent” variable.
2.Both variables being the result of a common cause. That is, the correlation
that exists between two variables is due to their being related to some third
force.

Page 1
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation

Example:
Let X1= ESLCE result
Y1= rate of surviving in the University
Y2= the rate of getting a scholar ship.

Both X1&Y1 and X1&Y2 have high positive correlation, likewiseY1 & Y2 have
positive correlation but they are not directly related, but they are related to each
other via X1.

3.Chance: The correlation that arises by chance is called spurious correlation.

Examples:
 Price of teff in Addis Ababa and grade of students in USA.
 Weight of individuals in Ethiopia and income of individuals in Kenya.

Therefore, while interpreting correlation coefficient, it is necessary to see if there is any

likelihood of any relation ship existing between variables under study.
The correlation coefficient between X and Y denoted by r is given by

r
 ( X i  X )(Yi  Y ) and the short cut formula is
 ( X i  X )  (Yi  Y )
2 2

n XY  ( X )(  Y )
r
[n X 2  ( X ) 2 ] [n Y 2  ( Y ) 2
r
 XY  nXY
[ X 2  nX 2 ] [ Y 2  nY 2 ]
Remark: Always this r lies between -1 and 1 inclusively and it is also symmetric.
Interpretation of r
1.Perfect positive linear relationship ( if r  1)
2.Some Positive linear relationship ( if r is between 0 and 1)
3.No linear relationship ( if r  0)
4.Some Negative linear relationship ( if r is between -1 and 0)
5.Perfect negative linear relationship ( if r  1)

Examples:

1. Calculate the simple correlation between mid semester and final exam scores of 10
students (both out of 50)

Page 2
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation

Student Mid Sem.Exam Final Sem.Exam

(X) (Y)
1 31 31
2 23 29
3 41 34
4 32 35
5 29 25
6 33 35
7 28 33
8 31 42
9 31 31
10 33 34
Solution:
n  10, X  31.2, Y  32.9, X 2  973.4, Y 2  1082.4
 XY  10331, X 2
 9920, Y 2
 11003

r
 XY  nXY
[ X 2  nX 2 ] [ Y 2  nY 2 ]
10331  10(31.2)(32.9)

(9920  10(973.4)) (11003  10(1082.4))
66.2
  0.363
182.5
This means mid semester exam and final exam scores have a slightly positive correlation.

Exercise The following data were collected from a certain household on the monthly
income (X) and consumption (Y) for the past 10 months. Compute the simple correlation
coefficient.

X: 650 654 720 456 536 853 735 650 536 666
Y: 450 523 235 398 500 632 500 635 450 360

 The above formula and procedure is only applicable on quantitative data, but when
we have qualitative data like efficiency, honesty, intelligence, etc we calculate what is
called Spearman’s rank correlation coefficient as follows:

Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula

Page 3
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation

6 Di
2
rs  1 
n(n 2  1)
Where rs  coefficient of rank correlatio n
D  the difference between paired ranks
n  the number of pairs
Example:
Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is
correlation between the tests of the ladies.

Lipstick types A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7

Solution:
D2
X Y R1-R2
6 Di
2
6(12)
(R1) (R2) (D)  rs  1   1   0.786
2 1 1 1 n(n 2  1) 7(48)
1 3 -2 4
4 2 2 4
3 4 -1 1
5 5 0 0 Yes, there is positive correlation.
7 6 1 1
6 7 -1 1
Total 12

Simple Linear Regression

- Simple linear regression refers to the linear relation ship between two variables
- We usually denote the dependent variable by Y and the independent variable by X.
- A simple regression line is the line fitted to the points plotted in the scatter diagram,
which would describe the average relation ship between the two variables. Therefore,
to see the type of relation ship, it is advisable to prepare scatter plot before fitting the
model.
Y    X  
Where :Y  Dependent var iable
- The linear model is: X  independen t var iable
  Re gression cons tan t
  regression slope
  random disturbance term
Y ~ N (  X ,  2 )
 ~ N (0,  2 )

Page 4
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation

- To estimate the parameters (  and  ) we have several methods:

 The free hand method
 The semi-average method
 The least square method
 The maximum likelihood method
 The method of moments
 Bayesian estimation technique.

- The above model is estimated by: Yˆ  a  bX

Where a is a constant which gives the value of Y when X=0 .It is called the Y-
intercept. b is a constant indicating the slope of the regression line, and it gives a
measure of the change in Y for a unit change in X. It is also regression coefficient of Y
on X.
- a and b are found by minimizing SSE      (Yi  Yˆi )
2 2

Where : Yi  observed value

Yˆi  estimated value  a  bX i
And this method is known as OLS (ordinary least square)
- Minimizing SSE    gives
2

b
 ( X i  X )(Yi  Y )   XY  nXY
 ( X i  X )2  X 2  nX 2
a  Y  bX

Example 1: The following data shows the score of 12 students for Accounting and Statistics
examinations.

a) Calculate a simple correlation coefficient

b) Fit a regression equation of Statistics on Accounting using least square estimates.
c) Predict the score of Statistics if the score of accounting is 85.

Page 5
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation

Accounting Statistics
X2 Y2 XY
X Y
1 74.00 81.00 5476.00 6561.00 5994.00
2 93.00 86.00 8649.00 7396.00 7998.00
3 55.00 67.00 3025.00 4489.00 3685.00
4 41.00 35.00 1681.00 1225.00 1435.00
5 23.00 30.00 529.00 900.00 690.00
6 92.00 100.00 8464.00 10000.00 9200.00
7 64.00 55.00 4096.00 3025.00 3520.00
8 40.00 52.00 1600.00 2704.00 2080.00
9 71.00 76.00 5041.00 5776.00 5396.00
10 33.00 24.00 1089.00 576.00 792.00
11 30.00 48.00 900.00 2304.00 1440.00
12 71.00 87.00 5041.00 7569.00 6177.00
Total 687.00 741.00 45591.00 52525.00 48407.00
Mean 57.25 61.75
a)

The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two
variables are positively correlated (Y increases as X increases).

where:

 Yˆ  7.0194  0.9560 X is the estimated regression line.

Page 6
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation

c) Insert X=85 in the estimated regression line.

Yˆ  7.0194  0.9560 X
 7.0194  0.9560(85)  88.28
Exercise: A car rental agency is interested in studying the relationship between the
distance driven in kilometer (Y) and the maintenance cost for their cars (X in birr). The
following summarized information is given based on samples of size 5.
2
i 1 X i  147,000,000 i 1Yi  314
5 5 2

i 1 X i  23,000 , i 1Yi  36 , i 1 X i Yi  212, 000

5 5 5

a) Find the least squares regression equation of Y on X

b) Compute the correlation coefficient and interpret it.
c) Estimate the maintenance cost of a car which has been driven for 6 km

- To know how far the regression equation has been able to explain the variation in Y we
2
use a measure called coefficient of determination ( r )

 (Yˆ  Y ) 2
i.e r 2

 (Y  Y ) 2
Where r  the simple correlatio n coefficient.
2
- r gives the proportion of the variation in Y explained by the regression of Y on X.
- 1  r gives the unexplained proportion and is called coefficient of indetermination.
2

Example: For the above problem (example 1): r  0.9194

 r 2  0.8453  84.53% of the variation in Y is explained and only 15.47% remains

unexplained and it will be accounted by the random term.

o Covariance of X and Y measures the co-variability of X and Y together. It is

denoted by S XY and given by

SX Y 
 ( X i  X )(Yi  Y )   XY  nXY
n 1 n 1

Page 7
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation

o Next we will see the relation ship between the coefficients.

2
S S
i. r  XY  r 2  X2 Y 2
S X SY S X SY
bS rS
ii. r X b Y
SY SX
o When we fit the regression of X on Y , we interchange X and Y in all formulas, i.e.
we fit

Xˆ  a1  b1Y

b1 
 XY  nXY
 Y 2  nY 2
b1SY
a1  X  b1Y , r
SX
Here X is dependent and Y is independent.

Choice of Dependent and Independent variable

- In correlation analysis there is no need of identifying the dependent and independent

variable, because r is symmetric. But in regression analysis
If bYX is the regression coefficient of Y on X
bXY is the regression coefficient of X on Y
bYX S X bXY SY
Then r   r 2  bYX * bXY
SY SX
- Moreover, bYX and bX Y are completely different numerically as well as
conceptually.

- Let us consider three cases concerning these coefficients.

1. If the correlation is perfect positive, i.e. r  1 then the b values reciprocals of each
other.
2. If S X  SY , then irrespective of the value of r the b values are equal, i.e.
r  bYX  bXY ( but this is unlikely case)
3. The most important case is when S X  SY and r  1, here the b values are not
equal or reciprocals to each other, but rather the two lines differ , intersecting at the
common point ( X , Y )

Page 8
Lecture notes on Introduction to Statistics IX: simple linear regression and correlation

 Thus to determine if a regression equation is X on Y or Y on X , we have to

use the formular 2  bYX * bXY
 If r [1,1] , then our assumption is correct
 If r [1,1] , then our assumption is wrong
Example: The regression line between height (X) in inches and weight (Y) in lbs of
male students are:
4Y  15 X  530  0 and
20 X  3Y  975  0
Determine which is regression of Y on X and X on Y

Solution
We will assume one of the equation as regression of X on Y and the other as Y on X
and calculate r

Assume 4Y  15 X  530  0 is regression of X on Y

20 X  3Y  975  0 is regression of Y on X
Then write these in the standard form.
530 4 4
4Y  15 X  530  0  X   Y  bXY 
15 15 15
 975 20 20
20 X  3Y  975  0  Y   X  bYX 
3 3 3
 4  20 
 r 2  bXY * bYX      1.78  1 ,
 15  3 
This is impossible (contradiction). Hence our assumption is not correct. Thus
4Y  15 X  530  0 is regression of Y on X
20 X  3Y  975  0 is regression of X on Y
To verify:
 530 15 15
4Y  15 X  530  0  Y   X  bYX 
4 4 4
975 3 3
20 X  3Y  975  0  X   Y  bXY 
20 20 20

 15  3  9
 r 2  bYX * bXY       0,1
 4  20  16

Page 9

Correlation and Regression
100% (4)
Correlation and Regression
49 pages
Statics Chapter 999
No ratings yet
Statics Chapter 999
9 pages
Stat 4-6 Chapter
No ratings yet
Stat 4-6 Chapter
37 pages
Chapter-4-Simple Linear Regression & Correlation
100% (3)
Chapter-4-Simple Linear Regression & Correlation
9 pages
Chapter-9
No ratings yet
Chapter-9
14 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Stat II Chapter 6
No ratings yet
Stat II Chapter 6
11 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Stastics ll:6
No ratings yet
Stastics ll:6
22 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
22 pages
L5 Correlation & Regression_082913
No ratings yet
L5 Correlation & Regression_082913
14 pages
Correlation
No ratings yet
Correlation
38 pages
06 Correlation and Regression
No ratings yet
06 Correlation and Regression
63 pages
Measures of Relationship - Day 2
No ratings yet
Measures of Relationship - Day 2
44 pages
33200122111_Kusumita Das_DWDM.pdf......
No ratings yet
33200122111_Kusumita Das_DWDM.pdf......
8 pages
QT 38, Edpm, Corporate Account
No ratings yet
QT 38, Edpm, Corporate Account
38 pages
Lecture - Correlation and Regression GEG 222
100% (1)
Lecture - Correlation and Regression GEG 222
67 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Correlation and Regression
No ratings yet
Correlation and Regression
6 pages
Chapter - Six
No ratings yet
Chapter - Six
8 pages
12 Correlation AND Regression: Objectives
No ratings yet
12 Correlation AND Regression: Objectives
30 pages
MGT-Three
No ratings yet
MGT-Three
86 pages
Chapitre2-stat
No ratings yet
Chapitre2-stat
17 pages
Statisticsprobability11 q4 Week8 v4
No ratings yet
Statisticsprobability11 q4 Week8 v4
9 pages
03 ES Regression Correlation
No ratings yet
03 ES Regression Correlation
14 pages
Stat Chapter 6
No ratings yet
Stat Chapter 6
23 pages
Lecture3-Enriching the Linear Models Slides-Annotated
No ratings yet
Lecture3-Enriching the Linear Models Slides-Annotated
42 pages
Simple Linear Regression: Y XI. XI X
No ratings yet
Simple Linear Regression: Y XI. XI X
25 pages
Lesson-3.3-Probability-Normal-Distribution-Linear-Regression-and-Correlation
No ratings yet
Lesson-3.3-Probability-Normal-Distribution-Linear-Regression-and-Correlation
29 pages
Simple Linear Regression and Correlation PDF
No ratings yet
Simple Linear Regression and Correlation PDF
7 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
No ratings yet
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
34 pages
Larsson 2020
No ratings yet
Larsson 2020
20 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
r23 p & s Unit 2 Material
No ratings yet
r23 p & s Unit 2 Material
14 pages
Chapter 3 - CORRELATION THEORY
No ratings yet
Chapter 3 - CORRELATION THEORY
9 pages
Chapter3-Econometrics-MultipleLinearRegressionModel
No ratings yet
Chapter3-Econometrics-MultipleLinearRegressionModel
40 pages
multiple regression edit_removed
No ratings yet
multiple regression edit_removed
14 pages
Correlation
No ratings yet
Correlation
15 pages
Correlation and Regression-1
No ratings yet
Correlation and Regression-1
32 pages
CHAPTER 3- POLYNOMIAL REGRESSION AND INTERACTION
No ratings yet
CHAPTER 3- POLYNOMIAL REGRESSION AND INTERACTION
12 pages
Correlation - Regression Complete
No ratings yet
Correlation - Regression Complete
130 pages
1 Loglinear Models For Contingency Tables
No ratings yet
1 Loglinear Models For Contingency Tables
12 pages
Theme 2 Ordinary Least Squares Regression
No ratings yet
Theme 2 Ordinary Least Squares Regression
10 pages
12 Bivariate Data Analysis: Regression and Correlation Methods
No ratings yet
12 Bivariate Data Analysis: Regression and Correlation Methods
22 pages
LECT-5-Correlation Co-Efficient and Data Presenting
No ratings yet
LECT-5-Correlation Co-Efficient and Data Presenting
9 pages
Econ7020X FinalReview (Answers)
No ratings yet
Econ7020X FinalReview (Answers)
10 pages
Syl-3. Correlation Analysis
No ratings yet
Syl-3. Correlation Analysis
16 pages
correlation
No ratings yet
correlation
3 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
22 pages
Correlation Regression L2
No ratings yet
Correlation Regression L2
84 pages
ECO 391-007 Lecture Handout For Chapter 15 SPRING 2003 Regression Analysis Sections 15.1, 15.2
No ratings yet
ECO 391-007 Lecture Handout For Chapter 15 SPRING 2003 Regression Analysis Sections 15.1, 15.2
22 pages
Stat and Probability Finals
No ratings yet
Stat and Probability Finals
7 pages
Unit 3 Simple Correlation and Regression Analysis1
No ratings yet
Unit 3 Simple Correlation and Regression Analysis1
16 pages
Statistics and Probability: Quarter 4 - (Week 6)
No ratings yet
Statistics and Probability: Quarter 4 - (Week 6)
8 pages
Multiple Linear Regression Model: (Or Equivalently
No ratings yet
Multiple Linear Regression Model: (Or Equivalently
41 pages
A New Coefficient of Correlation (Slides) - Sourav Chatterjee
No ratings yet
A New Coefficient of Correlation (Slides) - Sourav Chatterjee
29 pages
Module 3 - Data Analysis_S RM
No ratings yet
Module 3 - Data Analysis_S RM
63 pages
Correlation
No ratings yet
Correlation
32 pages
Barron's Physics Practice Plus: 400+ Online Questions and Quick Study Review
From Everand
Barron's Physics Practice Plus: 400+ Online Questions and Quick Study Review
Barron's Educational Series
No ratings yet
Chapter 4
No ratings yet
Chapter 4
55 pages
Chapter-8-Estimation & Hypothesis Testing
100% (1)
Chapter-8-Estimation & Hypothesis Testing
12 pages
Chapter-7-Sampling & Sampling Distributions
No ratings yet
Chapter-7-Sampling & Sampling Distributions
5 pages
Chapter-4 - Measures of Disperstion
No ratings yet
Chapter-4 - Measures of Disperstion
18 pages

Chapter-9-Simple Linear Regression & Correlation

Uploaded by

Chapter-9-Simple Linear Regression & Correlation

Uploaded by

Lecture notes on Introduction to Statistics IX: simple linear regression and correlation

Regression Analysis: is a statistical technique that can be used to develop a

Simple Correlation: Suppose we have two variables X  ( X 1 , X 2 ,... X n ) and

3.Chance: The correlation that arises by chance is called spurious correlation.

Therefore, while interpreting correlation coefficient, it is necessary to see if there is any

Student Mid Sem.Exam Final Sem.Exam

Simple Linear Regression

- To estimate the parameters (  and  ) we have several methods:

- The above model is estimated by: Yˆ  a  bX

Where : Yi  observed value

a) Calculate a simple correlation coefficient

 Yˆ  7.0194  0.9560 X is the estimated regression line.

c) Insert X=85 in the estimated regression line.

i 1 X i  23,000 , i 1Yi  36 , i 1 X i Yi  212, 000

a) Find the least squares regression equation of Y on X

Example: For the above problem (example 1): r  0.9194

 r 2  0.8453  84.53% of the variation in Y is explained and only 15.47% remains

o Covariance of X and Y measures the co-variability of X and Y together. It is

o Next we will see the relation ship between the coefficients.

Choice of Dependent and Independent variable

- In correlation analysis there is no need of identifying the dependent and independent

- Let us consider three cases concerning these coefficients.

 Thus to determine if a regression equation is X on Y or Y on X , we have to

Assume 4Y  15 X  530  0 is regression of X on Y

You might also like