0% found this document useful (0 votes)
4 views88 pages

Lecture 3-Simple linear regression and Correlation-MTH-106- UPDATED draft

The document covers the concepts of correlation and simple linear regression, focusing on the relationships between variables. It explains different types of correlations, the measurement of correlation using Karl Pearson’s coefficient, and the principles of simple linear regression analysis. Additionally, it includes examples and interpretations of regression parameters, emphasizing the importance of understanding these statistical techniques in research.

Uploaded by

kaizambaruku2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views88 pages

Lecture 3-Simple linear regression and Correlation-MTH-106- UPDATED draft

The document covers the concepts of correlation and simple linear regression, focusing on the relationships between variables. It explains different types of correlations, the measurement of correlation using Karl Pearson’s coefficient, and the principles of simple linear regression analysis. Additionally, it includes examples and interpretations of regression parameters, emphasizing the importance of understanding these statistical techniques in research.

Uploaded by

kaizambaruku2024
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 88

MTH 106

INTRODUCTORY
STATISTICS

05/06/2025 1
Correlation and Simple Linear
Regression

Topic-3

05/06/2025 2
Correlation
Introduction
 Researchers are often interested to know what
relationship exist, if any, between two or more
variables.
 Correlation is a measure of linear relationship
between variables.
 Correlation measures the strength or degree and
direction of linear relationship between two
variables.
 Note; Correlation does not imply a causal
relationship.
05/06/2025 3
Correlation
Introduction (Continued)
Examples of Correlation Cases
 Quantity of food consumed Vs. weight gained
by an animal.

 Quantity of nitrates in the soil Vs. growth rate


of plants

 Specie abundance variable (such as density,


breeding pairs, territorial area etc) Vs. environmental
sustainability variable (such as concentration of
greenhouse emissions)
05/06/2025 4
Correlation
Forms of linear relations between two variables
 Suppose we have two variables X and Y, the
forms of linear relations are as follows;

i. Positive or direct linear relationship


ii. Negative or indirect or inverse linear
relationship
iii. Zero or no linear relationship
iv. Non-linear relationship

05/06/2025 5
Correlation
Positive or direct relationship
 X and Y variables are said to have a positive or
direct linear relationship when X increases and
Y increases or when X decreases and Y
decreases

05/06/2025 6
Correlation
Negative or Indirect or Inverse relationship
 X and Y variables are said to have a negative
linear relationship when X increases and Y
decreases or when X decreases and Y increases

05/06/2025 7
Correlation
Zero or no relationship
 X and Y variables are said to have a zero or no
relationship when changes in X (either increase
or decrease) does not determine changes in Y

05/06/2025 8
Correlation
Non-linear relationship
 X and Y variables are said to have a non-linear
relationship when changes in X (either increase
or decrease) does not correspond with a
constant change in Y

05/06/2025 9
Measurement of correlation
Assumptions of Karl Pearson’s coefficient of
correlation ()
 The two variables X and Y should be measured
in a continuous scale
 Each variable, that is X and Y should be
normally distributed.
 There should not be outliers in either of the
variables.
 Each observation (data collected) should be
independent from the other observations.
05/06/2025 10
Measurement of correlation

 Karl Pearson’s Coefficient of correlation () is


given by;

05/06/2025 11
Measurement of correlation
 Alternatively, correlation coefficient is given by;

05/06/2025 12
Measurement of correlation
Properties of r

 - is the coefficient of correlation

 is unitless
 positive linear relationship (correlation)
 negative linear relationship (correlation)
 no linear relationship (correlation)
05/06/2025 13
Measurement of correlation
Properties of r (Continued)

𝒓 >𝟎 𝒓 <𝟎 𝒓 =𝟎
05/06/2025 14
Measurement of correlation
Properties of r (Continued)
 Magnitude of i.e., expresses the strength
(degree) of linear association.
No. Range of Interpretation
1 No linear relationship
2 Very weak linear relationship
3 Weak linear relationship
4 Average linear relationship
5 Strong linear relationship
6 .0 Very strong linear relationship
7 .0 Perfect linear relationship
 Sign (+/-) express the direction of association
05/06/2025 15
Measurement of correlation
Properties of (Continued)
 Magnitude of r that is expresses the strength
(degree) of linear association.

05/06/2025 16
Measurement of correlation
Example

 The following data are measurements on wing


length (X) and tail length (Y) for a sample of 12
birds.
 Compute the Coefficient of correlation
between the two variable and interpret your
results
 Demonstrate the relationship between the two
variables with an aid of a scatter plot.

05/06/2025 17
Measurement of correlation
Example
No Wing length (X cm) Tail length (Y cm)
1 10.4 7.4
2 10.8 7.6
3 11.1 7.9
4 10.2 7.2
5 10.3 7.4
6 10.2 7.1
7 10.7 7.4
8 10.5 7.2
9 10.8 7.8
10 11.2 7.7
11 10.6 7.8
12 11.4 8.3

05/06/2025 18
Measurement of correlation
Solution
No X Y X2 Y2 XY
1 10.4 7.4 108.16 54.76 76.96
2 10.8 7.6 116.64 57.76 82.08
3 11.1 7.9 123.21 62.41 87.69
4 10.2 7.2 104.04 51.84 73.44
5 10.3 7.4 106.09 54.76 76.22
6 10.2 7.1 104.04 50.41 72.42
7 10.7 7.4 114.49 54.76 79.18
8 10.5 7.2 110.25 51.84 75.6
9 10.8 7.8 116.64 60.84 84.24
10 11.2 7.7 125.44 59.29 86.24
11 10.6 7.8 112.36 60.84 82.68
12 11.4 8.3 129.96 68.89 94.62
n=12 Sum (X)=128.2 Sum (Y)=90.8 Sum (X2)=1371.32 Sum (Y2)=688.4 Sum (XY)=971.37

05/06/2025 19
Measurement of correlation
Solution

05/06/2025 20
Measurement of correlation
Solution
 Therefore Coefficient of Correlation is 0.87

 Since the value of this means that, there is a


very strong direct (positive) linear relationship
between wing length and tail length of such
birds.

 That is the longer the length of the wing the


longer the length of the tail and vice versa is
true.
05/06/2025 21
Measurement of correlation
Solution

05/06/2025 22
Measurement of correlation
Solution

05/06/2025 23
Measurement of correlation
Solution
EXCEL DEMONSTRATION

05/06/2025 24
Simple Linear Regression
Introduction
 Regression analysis is the statistical technique for
modeling and investigating the relationship
between variables (two or more)

 In such the relationship there is a dependent


variable and an independent variable.

05/06/2025 25
Simple Linear Regression
Introduction
 Dependent variable which is the one that is
determined by the other variable.

 It is also known as
• Response variable *
• Endogenous variable
• Criterion variable *
Most common term
• Regressand variable
• Outcome variable *
05/06/2025 26
Simple Linear Regression
Introduction (Continued)
 Independent variable is the one which
determines the other variable.

 This is also known as:-


• Predictor variable *
• Exogenous variable *
Most common term
• Regressor variable
• Explanatory variable *
• Input variable
05/06/2025 27
Simple Linear Regression
Introduction (Continued)
 For example the relationship between age and
blood pressure in humans.
• Dependent: blood pressure.
• Independent: age.

Note: In some problem the decision of dependent and


independent is not obvious / clear. Choose base on
intent of the study

E.g., Depression and financial stability


05/06/2025 28
Simple Linear Regression
Introduction (Continued)
 Generally, the dependence relationship of the
outcome variable on the predictor variable is
what is called Regression

 Simple regression

 Only two variables


• One dependent variable
• One independent variable
05/06/2025 29
Simple Linear Regression
Introduction (Continued)
 The adjective linear may be used to refer to the
relationship between the two variables being in
the straight line, but to the statistician it
describes the additive relationship of the two
parameters in the regression model.

 Linear in parameters

05/06/2025 30
Simple Linear Regression Model/Equation
The concept of a straight line

Example
 It is believed that, age of a sparrow bird is one of
the factors that determines its wing length.
Suppose you have been provided with age and
wing length data from a sample of 13 birds as
indicated in the following Table.
i. Present the data with an aid of a scatter plot
ii. Draw a straight line to fit the points

05/06/2025 31
Simple Linear Regression Model/Equation
No Age (days) Wing length (cm)
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0

05/06/2025 32
Simple Linear Regression Model/Equation
The concept of a straight line
 The Scatter plot to show age and wing length
data

05/06/2025 33
Simple Linear Regression Model/Equation
The concept of a straight line
 The straight line drawn to fit the points

05/06/2025 34
Simple Linear Regression Model/Equation
The concept of a straight line
 Mathematically the equation of the line that
describe wing length as the function of age is
given by

and are regression parameters

05/06/2025 35
Simple Linear Regression Model/Equation
The challenge of the straight line
 No matter whatever kinds of a straight line that
we would like to draw under the given the
mathematical equation so as to fit the points in
the scatter diagram, there will be considerable
variability of data around that line.

 The variability of a point from the straight line is


called an error or a residue.

05/06/2025 36
Simple Linear Regression Model/Equation
The challenge of the straight line

05/06/2025 37
Simple Linear Regression Model/Equation
Introduction of the regression model
 Since all data points don’t fall on the straight line,
we now seek to define what is commonly termed
the “best fit” line through the data.
 The ‘best fit’ line is the one that take into account
of the mathematical functional form and an error
or residue.
 Such kind of the functional form/line/model is
the linear regression model

05/06/2025 38
Simple Linear Regression Model/Equation
Introduction of the regression model
 The general form of a simple linear regression model is
as follows;

Y – Dependent Variable
X – Independent Variable
β0 – Intercept coefficient (Intercept parameter)
β1 – Slope coefficient (Slope parameter)
 – Residual or error, stochastic, disturbance term
05/06/2025 39
Simple Linear Regression Model/Equation
Introduction of the regression model

 For an individual observation linear regression


model can be defined as;

– stands for an individual observation


05/06/2025 40
Simple Linear Regression Model/Equation
Terms in the regression model
β0 – intercept coefficient
-It is taken when X=0
-The average (expected) value of Y (dependent)
variable without an influence of an X (explanatory)
variable .
β1 – Slope coefficient
-Express the quantity of change in Y for unit change
in X
-The average (expected) change in Y (explained)
variable brought about by a unit change observed in
X (independent) variable.
05/06/2025 41
Simple Linear Regression Model/Equation
Terms in the regression model

Error term
 - residual, error, stochastic, or disturbance term.
-Explains the influence of other variable not
included in the model (apart from given
independent variable)

05/06/2025 42
Simple Linear Regression Model/Equation
Interpretation of regression parameters

Example
 Suppose you were provided with the following
estimated simple linear regression models and
you are required to interpret them.

05/06/2025 43
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
First equation;
Intercept term/parameter

 The estimated expected value of a dependent variable


(Y) is 4.35 units in the absence or when there is no
influence of the independent variable (X)

05/06/2025 44
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
First equation;

Slope coefficient/parameter

 The expected value of dependent variable (Y)


increases by 1.56 units when the independent
variable (X) increases by a unit and vice versa is true
(Positive linear relationship)

05/06/2025 45
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
Second equation;
Intercept term/parameter

 The estimated expected value of a dependent variable


(Y) is -2.75 units in the absence or when there is
no influence of the independent variable (X)

05/06/2025 46
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
Second equation;

Slope coefficient/parameter
 The expected value of dependent variable (Y)
decreases by 0.675 units when the independent
variable (X) increases by a unit and vice versa is true
(Negative linear relationship)

05/06/2025 47
Simple Linear Regression Model/Equation
Assumptions of the Simple linear regression
 Linearity: The relationship between X and Y must
be linear.
 Independence of errors: There is no relationship
between the residuals and the Y variable; in
other words Y is independent of errors.
 Normality of errors: The residuals must be
approximately normally distributed.
 Equal variances: The variance of the residuals is
similar for all values of X.
05/06/2025 48
Fitting the simple regression line
 To fit the simple regression line means to
estimate the parameters of the simple linear
regression model. That is β0 and β1.
 There are several approaches to estimate the
mentioned regression parameters.
 The most common approach is Ordinary Least
Square (OLS) method because its estimated
parameters are Best Unbiased Linear Estimators
(BLUE) –(Based on Gauss-Makov Theorem)

05/06/2025 49
Fitting the simple regression line
Ordinary Least Square Method
 Ordinary least square method is an approach that
is used to estimate the parameters of the linear
regression models by minimizing the sum of
squared error of the regression models .

05/06/2025 50
Fitting the simple regression line
Estimation of model parameters

 From OLS method the parameter estimates are;

05/06/2025 51
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Example
 It is believed that, age of a sparrow bird is one of the
factors that determines its wing length. Suppose you
have been provided with age and wing length data from
a sample of 13 birds as indicated in the following Table .

i. Estimate the linear regression model by using OLS


method and interpret the relationship between age
and wing length of the bird.
ii. Predict wing length of a bird having 19 days

05/06/2025 52
Fitting the simple regression line
No Age (days) Wing length (cm)
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0

05/06/2025 53
Fitting the simple regression line
Estimation of β0 and β1 by using OLS

Solution
 The estimated simple linear Regression Model is
given by;

05/06/2025 54
Fitting the simple regression line
Estimation of β0 and β1 by using OLS

Solution
 The parameters in the model are calculated as;

05/06/2025 55
Fitting the simple regression line
No X Y X2 XY
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0
n=13
05/06/2025 56
Fitting the simple regression line
No X Y X2 XY
1 3.0 1.4 9 4.2
2 4.0 1.5 16 6
3 5.0 2.2 25 11
4 6.0 2.4 36 14.4
5 8.0 3.1 64 24.8
6 9.0 3.2 81 28.8
7 10.0 3.2 100 32
8 11.0 3.9 121 42.9
9 12.0 4.1 144 49.2
10 14.0 4.7 196 65.8
11 15.0 4.5 225 67.5
12 16.0 5.2 256 83.2
13 17.0 5.0 289 85
n=13 Sum (X)=130 Sum(Y)=44.4 Sum (X2)=1562 Sum(XY)=514.8
05/06/2025 57
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution

05/06/2025 58
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution

05/06/2025 59
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution

05/06/2025 60
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution

05/06/2025 61
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
 Therefore, the estimated simple linear regression model
is given by;

05/06/2025 62
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
 In other words

 This means that, other factors remain constant the


average Wing length of a sparrow bird increases by
0.2702cm per day

 Normally we do not make interpretation when X=0. If necessary under


this case we case we can comment that a sparrow bird is having a wing
length of 0.7134cm just after hatchling other factors remain constant
05/06/2025 63
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
 The predicted wing length of a bird with 19 days is given
by;

 Therefore, the predicted wing length of a bird with 19


days is 5.8472cm, other factors remain constant

05/06/2025 64
Data, Estimated(fitted) value, Residual

Example
 Refer to the Sparrow bird example. Use the
estimated regression model to prepare a table
consisting of the following columns;

i. Dependent variable (Data)


ii. Estimated or fitted value
iii. Residuals or Errors

05/06/2025 65
Data, Estimated(fitted) value, Residual

05/06/2025 66
Data, Estimated(fitted) value, Residual
 Note; One of the properties of the Least square
estimated models is that the summation of errors
or residuals across all observations in a sample is
equal to zero. Mathematically it can be expressed
as;

05/06/2025 67
Total variability of an outcome variable
 Total variability of an outcome variable refers
to the square summation of the deviation of
the dependent variable (Y) from its central
value (Mean)
 Total variability of the dependent variable (Y)
can be partitioned or broken down into
i. Explained variability (variability due to
estimated regression model)
ii. Unexplained variability (variability due to
errors or residual)
05/06/2025 68
Total variability of an outcome variable
 Total variability of the dependent variable Y is
also known as Total Sum Square (TSS) or Sum
of Square Total (SSTotal)
 Total variability due to regression model is
also known as Explained Sum Square (ESS) or
Sum of Square Regression (SSRegression)
 Total variability due to error or residual is also
known as Residual Sum Square (RSS) or Sum
of Square Residual (SSResidual)

05/06/2025 69
Total variability of an outcome variable
 Mathematically, partition of the total
variability of the dependent variable Y can be
presented as follows;

05/06/2025 70
Total variability of an outcome variable
 Explained / regression Sum of Square (ESS)

05/06/2025 71
Total variability of an outcome variable
 Residual sum of square (RSS)

05/06/2025 72
Coefficient of determination
 The coefficient of determination refers to the
proportion or percentage of the total
variability in Y (dependent variable) that is
explained or accounted for by a fitted
regression model.

 Coefficient of determination is the measure


of goodness of fit of the regression model.

 Coefficient of determination is denoted by R2.

05/06/2025 73
Coefficient of determination
 Mathematically coefficient of determination
is computed as;

05/06/2025 74
Coefficient of determination
Note:

1. If all the data points fall exactly on a line


having non-zero slope, then
2. If , then
3. For a simple linear regression
and
[For direction is given by sign of
4. The estimate of slope and coefficients of are
related as:

05/06/2025 75
Coefficient of determination
Interpretation
 R2 is usually expressed in terms of percentage.
That is;
R2 *100
Example:
 If R2 =0.48
 We say about 48% i.e (0.48*100) of the
variability in Y is explained by variability in X’s.
 This also Implies that; about 52% of variability in
Y is explained by other variables not included in
the model. i.e residual.
05/06/2025 76
Computation of R2
Example
 Refer to the Sparrow bird example. Use the
sample data and estimated regression model to
compute the following and interpret your results;

i. Coefficient of determination

05/06/2025 77
Computation of R2
Solution
 The estimated linear regression model was given
by

 Where β1 =0.2702
 Now, consider the next table

05/06/2025 78
Computation of R2
No X Y X2 Y2 XY
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0

05/06/2025 79
Computation of R2
No X Y X2 Y2 XY
1 3.0 1.4
9 1.96 4.2
2 4.0 1.5
16 2.25 6
3 5.0 2.2
25 4.84 11
4 6.0 2.4
36 5.76 14.4
5 8.0 3.1
64 9.61 24.8
6 9.0 3.2
81 10.24 28.8
7 10.0 3.2
100 10.24 32
8 11.0 3.9
121 15.21 42.9
9 12.0 4.1
144 16.81 49.2
10 14.0 4.7
196 22.09 65.8
11 15.0 4.5
225 20.25 67.5
12 16.0 5.2
256 27.04 83.2
13 17.0 5.0
289 25 85
n=13 Sum (X)=130 Sum(Y)=44.4 Sum (X2)=1562 Sum (Y2)=171.3 Sum(XY)=514.8

05/06/2025 80
Computation of R2
Solution
 Coefficient of determination can be obtained as;

05/06/2025 81
Computation of R2
Solution

 This indicates that the estimated linear regression


model can explain 97.3% of the variations of the
wing length of the bird.
05/06/2025 82
General Example
Geology: Earthquakes Is the magnitude of an
earthquake related to the depth below the surface at
which the quake occurs?

Let x be the magnitude of an earthquake (on the Richter


scale), and let y be the depth (in kilometers) of the quake
below the surface at the epicenter. The following is based
on information taken from the National Earthquake
Information Service of the U.S. Geological Survey.

Additional data may be found by visiting the web site for83


05/06/2025
General Example
Geology: Earthquakes

05/06/2025 84
General Example
Geology: Earthquakes

Compute and interpret the following measures


1. Coefficient of variation for x and y.
2. Karl Pearson’s coefficient of correlation between x and y.
3. Estimate parameter of linear regression equation of y on x
using OLS.
4. Coefficient of determination.
5. Estimate the depth of the quake below the surface of
epicenter for the earthquake of magnitude 5.0 Richter scale.

05/06/2025 85
General Example
Geology: Earthquakes

05/06/2025 86
General Example
Geology: Earthquakes

05/06/2025 87
The End

05/06/2025 88

You might also like