Lecture 3-Simple linear regression and Correlation-MTH-106- UPDATED draft
Lecture 3-Simple linear regression and Correlation-MTH-106- UPDATED draft
INTRODUCTORY
STATISTICS
05/06/2025 1
Correlation and Simple Linear
Regression
Topic-3
05/06/2025 2
Correlation
Introduction
Researchers are often interested to know what
relationship exist, if any, between two or more
variables.
Correlation is a measure of linear relationship
between variables.
Correlation measures the strength or degree and
direction of linear relationship between two
variables.
Note; Correlation does not imply a causal
relationship.
05/06/2025 3
Correlation
Introduction (Continued)
Examples of Correlation Cases
Quantity of food consumed Vs. weight gained
by an animal.
05/06/2025 5
Correlation
Positive or direct relationship
X and Y variables are said to have a positive or
direct linear relationship when X increases and
Y increases or when X decreases and Y
decreases
05/06/2025 6
Correlation
Negative or Indirect or Inverse relationship
X and Y variables are said to have a negative
linear relationship when X increases and Y
decreases or when X decreases and Y increases
05/06/2025 7
Correlation
Zero or no relationship
X and Y variables are said to have a zero or no
relationship when changes in X (either increase
or decrease) does not determine changes in Y
05/06/2025 8
Correlation
Non-linear relationship
X and Y variables are said to have a non-linear
relationship when changes in X (either increase
or decrease) does not correspond with a
constant change in Y
05/06/2025 9
Measurement of correlation
Assumptions of Karl Pearson’s coefficient of
correlation ()
The two variables X and Y should be measured
in a continuous scale
Each variable, that is X and Y should be
normally distributed.
There should not be outliers in either of the
variables.
Each observation (data collected) should be
independent from the other observations.
05/06/2025 10
Measurement of correlation
05/06/2025 11
Measurement of correlation
Alternatively, correlation coefficient is given by;
05/06/2025 12
Measurement of correlation
Properties of r
is unitless
positive linear relationship (correlation)
negative linear relationship (correlation)
no linear relationship (correlation)
05/06/2025 13
Measurement of correlation
Properties of r (Continued)
𝒓 >𝟎 𝒓 <𝟎 𝒓 =𝟎
05/06/2025 14
Measurement of correlation
Properties of r (Continued)
Magnitude of i.e., expresses the strength
(degree) of linear association.
No. Range of Interpretation
1 No linear relationship
2 Very weak linear relationship
3 Weak linear relationship
4 Average linear relationship
5 Strong linear relationship
6 .0 Very strong linear relationship
7 .0 Perfect linear relationship
Sign (+/-) express the direction of association
05/06/2025 15
Measurement of correlation
Properties of (Continued)
Magnitude of r that is expresses the strength
(degree) of linear association.
05/06/2025 16
Measurement of correlation
Example
05/06/2025 17
Measurement of correlation
Example
No Wing length (X cm) Tail length (Y cm)
1 10.4 7.4
2 10.8 7.6
3 11.1 7.9
4 10.2 7.2
5 10.3 7.4
6 10.2 7.1
7 10.7 7.4
8 10.5 7.2
9 10.8 7.8
10 11.2 7.7
11 10.6 7.8
12 11.4 8.3
05/06/2025 18
Measurement of correlation
Solution
No X Y X2 Y2 XY
1 10.4 7.4 108.16 54.76 76.96
2 10.8 7.6 116.64 57.76 82.08
3 11.1 7.9 123.21 62.41 87.69
4 10.2 7.2 104.04 51.84 73.44
5 10.3 7.4 106.09 54.76 76.22
6 10.2 7.1 104.04 50.41 72.42
7 10.7 7.4 114.49 54.76 79.18
8 10.5 7.2 110.25 51.84 75.6
9 10.8 7.8 116.64 60.84 84.24
10 11.2 7.7 125.44 59.29 86.24
11 10.6 7.8 112.36 60.84 82.68
12 11.4 8.3 129.96 68.89 94.62
n=12 Sum (X)=128.2 Sum (Y)=90.8 Sum (X2)=1371.32 Sum (Y2)=688.4 Sum (XY)=971.37
05/06/2025 19
Measurement of correlation
Solution
05/06/2025 20
Measurement of correlation
Solution
Therefore Coefficient of Correlation is 0.87
05/06/2025 22
Measurement of correlation
Solution
05/06/2025 23
Measurement of correlation
Solution
EXCEL DEMONSTRATION
05/06/2025 24
Simple Linear Regression
Introduction
Regression analysis is the statistical technique for
modeling and investigating the relationship
between variables (two or more)
05/06/2025 25
Simple Linear Regression
Introduction
Dependent variable which is the one that is
determined by the other variable.
It is also known as
• Response variable *
• Endogenous variable
• Criterion variable *
Most common term
• Regressand variable
• Outcome variable *
05/06/2025 26
Simple Linear Regression
Introduction (Continued)
Independent variable is the one which
determines the other variable.
Simple regression
Linear in parameters
05/06/2025 30
Simple Linear Regression Model/Equation
The concept of a straight line
Example
It is believed that, age of a sparrow bird is one of
the factors that determines its wing length.
Suppose you have been provided with age and
wing length data from a sample of 13 birds as
indicated in the following Table.
i. Present the data with an aid of a scatter plot
ii. Draw a straight line to fit the points
05/06/2025 31
Simple Linear Regression Model/Equation
No Age (days) Wing length (cm)
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0
05/06/2025 32
Simple Linear Regression Model/Equation
The concept of a straight line
The Scatter plot to show age and wing length
data
05/06/2025 33
Simple Linear Regression Model/Equation
The concept of a straight line
The straight line drawn to fit the points
05/06/2025 34
Simple Linear Regression Model/Equation
The concept of a straight line
Mathematically the equation of the line that
describe wing length as the function of age is
given by
05/06/2025 35
Simple Linear Regression Model/Equation
The challenge of the straight line
No matter whatever kinds of a straight line that
we would like to draw under the given the
mathematical equation so as to fit the points in
the scatter diagram, there will be considerable
variability of data around that line.
05/06/2025 36
Simple Linear Regression Model/Equation
The challenge of the straight line
05/06/2025 37
Simple Linear Regression Model/Equation
Introduction of the regression model
Since all data points don’t fall on the straight line,
we now seek to define what is commonly termed
the “best fit” line through the data.
The ‘best fit’ line is the one that take into account
of the mathematical functional form and an error
or residue.
Such kind of the functional form/line/model is
the linear regression model
05/06/2025 38
Simple Linear Regression Model/Equation
Introduction of the regression model
The general form of a simple linear regression model is
as follows;
Y – Dependent Variable
X – Independent Variable
β0 – Intercept coefficient (Intercept parameter)
β1 – Slope coefficient (Slope parameter)
– Residual or error, stochastic, disturbance term
05/06/2025 39
Simple Linear Regression Model/Equation
Introduction of the regression model
05/06/2025 40
Simple Linear Regression Model/Equation
Terms in the regression model
β0 – intercept coefficient
-It is taken when X=0
-The average (expected) value of Y (dependent)
variable without an influence of an X (explanatory)
variable .
β1 – Slope coefficient
-Express the quantity of change in Y for unit change
in X
-The average (expected) change in Y (explained)
variable brought about by a unit change observed in
X (independent) variable.
05/06/2025 41
Simple Linear Regression Model/Equation
Terms in the regression model
Error term
- residual, error, stochastic, or disturbance term.
-Explains the influence of other variable not
included in the model (apart from given
independent variable)
05/06/2025 42
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Example
Suppose you were provided with the following
estimated simple linear regression models and
you are required to interpret them.
05/06/2025 43
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
First equation;
Intercept term/parameter
05/06/2025 44
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
First equation;
Slope coefficient/parameter
05/06/2025 45
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
Second equation;
Intercept term/parameter
05/06/2025 46
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
Second equation;
Slope coefficient/parameter
The expected value of dependent variable (Y)
decreases by 0.675 units when the independent
variable (X) increases by a unit and vice versa is true
(Negative linear relationship)
05/06/2025 47
Simple Linear Regression Model/Equation
Assumptions of the Simple linear regression
Linearity: The relationship between X and Y must
be linear.
Independence of errors: There is no relationship
between the residuals and the Y variable; in
other words Y is independent of errors.
Normality of errors: The residuals must be
approximately normally distributed.
Equal variances: The variance of the residuals is
similar for all values of X.
05/06/2025 48
Fitting the simple regression line
To fit the simple regression line means to
estimate the parameters of the simple linear
regression model. That is β0 and β1.
There are several approaches to estimate the
mentioned regression parameters.
The most common approach is Ordinary Least
Square (OLS) method because its estimated
parameters are Best Unbiased Linear Estimators
(BLUE) –(Based on Gauss-Makov Theorem)
05/06/2025 49
Fitting the simple regression line
Ordinary Least Square Method
Ordinary least square method is an approach that
is used to estimate the parameters of the linear
regression models by minimizing the sum of
squared error of the regression models .
05/06/2025 50
Fitting the simple regression line
Estimation of model parameters
05/06/2025 51
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Example
It is believed that, age of a sparrow bird is one of the
factors that determines its wing length. Suppose you
have been provided with age and wing length data from
a sample of 13 birds as indicated in the following Table .
05/06/2025 52
Fitting the simple regression line
No Age (days) Wing length (cm)
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0
05/06/2025 53
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
The estimated simple linear Regression Model is
given by;
05/06/2025 54
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
The parameters in the model are calculated as;
05/06/2025 55
Fitting the simple regression line
No X Y X2 XY
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0
n=13
05/06/2025 56
Fitting the simple regression line
No X Y X2 XY
1 3.0 1.4 9 4.2
2 4.0 1.5 16 6
3 5.0 2.2 25 11
4 6.0 2.4 36 14.4
5 8.0 3.1 64 24.8
6 9.0 3.2 81 28.8
7 10.0 3.2 100 32
8 11.0 3.9 121 42.9
9 12.0 4.1 144 49.2
10 14.0 4.7 196 65.8
11 15.0 4.5 225 67.5
12 16.0 5.2 256 83.2
13 17.0 5.0 289 85
n=13 Sum (X)=130 Sum(Y)=44.4 Sum (X2)=1562 Sum(XY)=514.8
05/06/2025 57
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
05/06/2025 58
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
05/06/2025 59
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
05/06/2025 60
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
05/06/2025 61
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
Therefore, the estimated simple linear regression model
is given by;
05/06/2025 62
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
In other words
05/06/2025 64
Data, Estimated(fitted) value, Residual
Example
Refer to the Sparrow bird example. Use the
estimated regression model to prepare a table
consisting of the following columns;
05/06/2025 65
Data, Estimated(fitted) value, Residual
05/06/2025 66
Data, Estimated(fitted) value, Residual
Note; One of the properties of the Least square
estimated models is that the summation of errors
or residuals across all observations in a sample is
equal to zero. Mathematically it can be expressed
as;
05/06/2025 67
Total variability of an outcome variable
Total variability of an outcome variable refers
to the square summation of the deviation of
the dependent variable (Y) from its central
value (Mean)
Total variability of the dependent variable (Y)
can be partitioned or broken down into
i. Explained variability (variability due to
estimated regression model)
ii. Unexplained variability (variability due to
errors or residual)
05/06/2025 68
Total variability of an outcome variable
Total variability of the dependent variable Y is
also known as Total Sum Square (TSS) or Sum
of Square Total (SSTotal)
Total variability due to regression model is
also known as Explained Sum Square (ESS) or
Sum of Square Regression (SSRegression)
Total variability due to error or residual is also
known as Residual Sum Square (RSS) or Sum
of Square Residual (SSResidual)
05/06/2025 69
Total variability of an outcome variable
Mathematically, partition of the total
variability of the dependent variable Y can be
presented as follows;
05/06/2025 70
Total variability of an outcome variable
Explained / regression Sum of Square (ESS)
05/06/2025 71
Total variability of an outcome variable
Residual sum of square (RSS)
05/06/2025 72
Coefficient of determination
The coefficient of determination refers to the
proportion or percentage of the total
variability in Y (dependent variable) that is
explained or accounted for by a fitted
regression model.
05/06/2025 73
Coefficient of determination
Mathematically coefficient of determination
is computed as;
05/06/2025 74
Coefficient of determination
Note:
05/06/2025 75
Coefficient of determination
Interpretation
R2 is usually expressed in terms of percentage.
That is;
R2 *100
Example:
If R2 =0.48
We say about 48% i.e (0.48*100) of the
variability in Y is explained by variability in X’s.
This also Implies that; about 52% of variability in
Y is explained by other variables not included in
the model. i.e residual.
05/06/2025 76
Computation of R2
Example
Refer to the Sparrow bird example. Use the
sample data and estimated regression model to
compute the following and interpret your results;
i. Coefficient of determination
05/06/2025 77
Computation of R2
Solution
The estimated linear regression model was given
by
Where β1 =0.2702
Now, consider the next table
05/06/2025 78
Computation of R2
No X Y X2 Y2 XY
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0
05/06/2025 79
Computation of R2
No X Y X2 Y2 XY
1 3.0 1.4
9 1.96 4.2
2 4.0 1.5
16 2.25 6
3 5.0 2.2
25 4.84 11
4 6.0 2.4
36 5.76 14.4
5 8.0 3.1
64 9.61 24.8
6 9.0 3.2
81 10.24 28.8
7 10.0 3.2
100 10.24 32
8 11.0 3.9
121 15.21 42.9
9 12.0 4.1
144 16.81 49.2
10 14.0 4.7
196 22.09 65.8
11 15.0 4.5
225 20.25 67.5
12 16.0 5.2
256 27.04 83.2
13 17.0 5.0
289 25 85
n=13 Sum (X)=130 Sum(Y)=44.4 Sum (X2)=1562 Sum (Y2)=171.3 Sum(XY)=514.8
05/06/2025 80
Computation of R2
Solution
Coefficient of determination can be obtained as;
05/06/2025 81
Computation of R2
Solution
05/06/2025 84
General Example
Geology: Earthquakes
05/06/2025 85
General Example
Geology: Earthquakes
05/06/2025 86
General Example
Geology: Earthquakes
05/06/2025 87
The End
05/06/2025 88