Linear Regression

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. It finds the best fit straight line through the data points to predict future outcomes. The linear regression equation takes the form of y = mx + b, where m is the slope and b is the y-intercept. Gradient descent is used to minimize the cost function (mean squared error) and iteratively update the coefficient values m and b to obtain the best fit line. Linear regression has advantages of being simple to implement and interpret, but assumes a linear relationship between variables.

Uploaded by

Parthasarathi Hazra

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

76 views

Linear Regression

Uploaded by

Parthasarathi Hazra

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Linear Regression

What is a Regression?
Regression shows a line or curve that passes through all the data points on a target-predictor graph in
such a way that the vertical distance between the data points and the regression line is minimum.” It is
used principally for prediction, forecasting, time series modeling, and determining the causal-effect
relationship between variables.
Linear Regression
Linear regression is a quiet and simple statistical regression method used for predictive analysis and shows
the relationship between the continuous variables. Linear regression shows the linear relationship between
the independent variable (X-axis) and the dependent variable (Y-axis), consequently called linear
regression. If there is a single input variable (x), such linear regression is called simple linear regression.
And if there is more than one input variable, such linear regression is called multiple linear
regression. The linear regression model gives a sloped straight line describing the relationship within the
variables.

The above graph presents the linear relationship between the dependent variable and independent variables.
When the value of x (independent variable) increases, the value of y (dependent variable) is likewise
increasing. The red line is referred to as the best fit straight line. Based on the given data points, we try to
plot a line that models the points the best.
To calculate best-fit line linear regression uses a traditional slope-intercept form.
y= Dependent Variable.

x= Independent Variable.

a0= intercept of the line.

a1 = Linear regression coefficient.

Steps of Linear Regression
As the name suggested, the idea behind performing Linear Regression is that we should come up with a
linear equation that describes the relationship between dependent and independent variables.
Step 1
Let’s assume that we have a dataset where x is the independent variable and Y is a function of x (Y=f(x)).
Thus, by using Linear Regression we can form the following equation (equation for the best-fitted line):
Y = mx + c
This is an equation of a straight line where m is the slope of the line and c is the intercept.
Step 2
Now, to derive the best-fitted line, first, we assign random values to m and c and calculate the corresponding
value of Y for a given x. This Y value is the output value.
Step 3
As Logistic Regression is a supervised Machine Learning algorithm, we already know the value of actual
Y (dependent variable). Now, as we have our calculated output value (let’s represent it as ŷ), we can verify
whether our prediction is accurate or not.
In the case of Linear Regression, we calculate this error (residual) by using the MSE method (mean squared
error) and we name it as loss function:
Loss function can be written as:
L = 1/n ∑((y – ŷ)2)
Where n is the number of observations.
Step 4
To achieve the best-fitted line, we have to minimize the value of the loss function.
To minimize the loss function, we use a technique called gradient descent.

Need of a Linear regression

Linear regression estimates the relationship between a dependent variable and an independent variable.
Let’s understand this with an easy example:
Let’s say we want to estimate the salary of an employee based on year of experience. You have the recent
company data, which indicates that the relationship between experience and salary. Here year of experience
is an independent variable, and the salary of an employee is a dependent variable, as the salary of an
employee is dependent on the experience of an employee. Using this insight, we can predict the future
salary of the employee based on current & past information.
A regression line can be a Positive Linear Relationship or a Negative Linear Relationship.
Positive Linear Relationship
If the dependent variable expands on the Y-axis and the independent variable progress on X-axis, then such
a relationship is termed a Positive linear relationship.

Negative Linear Relationship

If the dependent variable decreases on the Y-axis and the independent variable increases on the X-axis,
such a relationship is called a negative linear relationship.
The goal of the linear regression algorithm is to get the best values for a0 and a1 to find the best fit line.
The best fit line should have the least error means the error between predicted values and actual values
should be minimized.
Cost function
The cost function helps to figure out the best possible values for a0 and a1, which provides the best fit line
for the data points.
Cost function optimizes the regression coefficients or weights and measures how a linear regression model
is performing. The cost function is used to find the accuracy of the mapping function that maps the input
variable to the output variable. This mapping function is also known as the Hypothesis function.
In Linear Regression, Mean Squared Error (MSE) cost function is used, which is the average of squared
error that occurred between the predicted values and actual values.
By simple linear equation y=mx+b we can calculate MSE as:
Let’s y = actual values, yi = predicted values

Using the MSE function, we will change the values of a0 and a1 such that the MSE value settles at the
minima. Model parameters xi, b (a0,a1) can be manipulated to minimize the cost function. These parameters
can be determined using the gradient descent method so that the cost function value is minimum.
Gradient descent
Gradient descent is a method of updating a0 and a1 to minimize the cost function (MSE). A regression
model uses gradient descent to update the coefficients of the line (a0, a1 => xi, b) by reducing the cost
function by a random selection of coefficient values and then iteratively update the values to reach the
minimum cost function.

Imagine a pit in the shape of U. You are standing at the topmost point in the pit, and your objective is to
reach the bottom of the pit. There is a treasure, and you can only take a discrete number of steps to reach
the bottom. If you decide to take one footstep at a time, you would eventually get to the bottom of the pit
but, this would take a longer time. If you choose to take longer steps each time, you may get to sooner but,
there is a chance that you could overshoot the bottom of the pit and not near the bottom. In the gradient
descent algorithm, the number of steps you take is the learning rate, and this decides how fast the algorithm
converges to the minima.
To update a0 and a1, we take gradients from the cost function. To find these gradients, we take partial
derivatives for a0 and a1.

The partial derivate are the gradients, and they are used to update the values of a0 and a1. Alpha is the
learning rate.
Impact of different values for learning rate
The blue line represents the optimal value of the learning rate, and the cost function value is minimized in
a few iterations. The green line represents if the learning rate is lower than the optimal value, then the
number of iterations required high to minimize the cost function. If the learning rate selected is very high,
the cost function could continue to increase with iterations and saturate at a value higher than the minimum
value, that represented by a red and black line.
Advantages and Disadvantages of Linear Regression

Advantages Disadvantages

On the other hand in linear regression technique

Linear Regression is simple to implement and outliers can have huge effects on the regression and
easier to interpret the output coefficients. boundaries are linear in this technique.
Advantages Disadvantages

When you know the relationship between the Diversely, linear regression assumes a linear
independent and dependent variable have a relationship between dependent and independent
linear relationship, this algorithm is the best variables. That means it assumes that there is a
to use because of it’s less complexity to straight-line relationship between them. It assumes
compared to other algorithms. independence between attributes.

James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
100% (1)
James W. Hardin, Joseph M. Hilbe - Generalized Linear Models and Extensions-Stata Press (2018)
789 pages
Homework 5
No ratings yet
Homework 5
5 pages
Case Study - Facility Layout at Wheeled Coach
No ratings yet
Case Study - Facility Layout at Wheeled Coach
6 pages
Fitohormon Intro
No ratings yet
Fitohormon Intro
27 pages
Hasil Review Jurnal Inventarisasi Hutan (ING)
No ratings yet
Hasil Review Jurnal Inventarisasi Hutan (ING)
5 pages
Journal of Micros PDF
No ratings yet
Journal of Micros PDF
3 pages
Perkecambahan Mahoni
No ratings yet
Perkecambahan Mahoni
10 pages
Jurnal Variasi Sifat Fisika Dan Mekanika Kayu Nangka
No ratings yet
Jurnal Variasi Sifat Fisika Dan Mekanika Kayu Nangka
11 pages
Laporan Tekanan Osmosis Dan Potensial Air - Nur Yusniarni
No ratings yet
Laporan Tekanan Osmosis Dan Potensial Air - Nur Yusniarni
18 pages
Molecular Biotechnology
No ratings yet
Molecular Biotechnology
40 pages
Materi Genetik Edit
No ratings yet
Materi Genetik Edit
48 pages
Interaksi Genotip X Lingkungan As Dan Stabilitas Hasil Dalam An Tanaman Varietas Unggul Di Indonesia
No ratings yet
Interaksi Genotip X Lingkungan As Dan Stabilitas Hasil Dalam An Tanaman Varietas Unggul Di Indonesia
8 pages
Quantum MechANICS
No ratings yet
Quantum MechANICS
1,237 pages
Morphology of Plants
No ratings yet
Morphology of Plants
7 pages
Jurnal Hidroponik
No ratings yet
Jurnal Hidroponik
6 pages
Lecture 1 DIFFERENCE BETWEEN SCIENCE AND ENGINEERING
No ratings yet
Lecture 1 DIFFERENCE BETWEEN SCIENCE AND ENGINEERING
38 pages
Pengaruh EMS Terhadap Pertumbuhan Dan Variasi Tanaman Marigold
No ratings yet
Pengaruh EMS Terhadap Pertumbuhan Dan Variasi Tanaman Marigold
6 pages
Piezoelectric Transducer
No ratings yet
Piezoelectric Transducer
5 pages
2018 Bookmatter PlantAnatomy PDF
No ratings yet
2018 Bookmatter PlantAnatomy PDF
15 pages
Nitrogen
No ratings yet
Nitrogen
19 pages
Environmental Biotechnology English
No ratings yet
Environmental Biotechnology English
4 pages
Operons and Prokaryotic Gene Regulation08 NatEdu
No ratings yet
Operons and Prokaryotic Gene Regulation08 NatEdu
2 pages
Analisis Respons Non-Linear - Metoda Percepatan Linear (Linear Accelleration Method - Cough and Penzien, Dynamics of Structures)
No ratings yet
Analisis Respons Non-Linear - Metoda Percepatan Linear (Linear Accelleration Method - Cough and Penzien, Dynamics of Structures)
5 pages
Makalah Pengaruh Radiasi Matahari
No ratings yet
Makalah Pengaruh Radiasi Matahari
22 pages
Removal Sampling
80% (5)
Removal Sampling
11 pages
Eucarya Towards A Natural System of Organisms: Proposal For The Domains Archaea, Bacteria, and
100% (1)
Eucarya Towards A Natural System of Organisms: Proposal For The Domains Archaea, Bacteria, and
5 pages
Makalah Bahasa Inggris Pertanian Criteria of A Proper Scientific Presentation
No ratings yet
Makalah Bahasa Inggris Pertanian Criteria of A Proper Scientific Presentation
8 pages
Weed Management in Organic Farming
No ratings yet
Weed Management in Organic Farming
31 pages
Cekaman Air
No ratings yet
Cekaman Air
4 pages
Crop Growth Modeling
No ratings yet
Crop Growth Modeling
17 pages
Flight I: Structure & Function of Wings
No ratings yet
Flight I: Structure & Function of Wings
11 pages
PRAGuidebook
No ratings yet
PRAGuidebook
92 pages
Identifikasi Nilai Konstanta Bentuk Daun 7183d5c1
No ratings yet
Identifikasi Nilai Konstanta Bentuk Daun 7183d5c1
10 pages
Stella Software
No ratings yet
Stella Software
15 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
UNIt-3 TY
No ratings yet
UNIt-3 TY
67 pages
simple linear regression with example problem
No ratings yet
simple linear regression with example problem
12 pages
Linear Regression in Machine Learning MY NOTES
No ratings yet
Linear Regression in Machine Learning MY NOTES
21 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Regression
No ratings yet
Regression
45 pages
Unit III
No ratings yet
Unit III
18 pages
ML 2 nd Unit
No ratings yet
ML 2 nd Unit
50 pages
unit 2 svms linear logistic regression
No ratings yet
unit 2 svms linear logistic regression
9 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Regression
No ratings yet
Regression
14 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Regression PDF
No ratings yet
Regression PDF
16 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Linear Best Fit
No ratings yet
Linear Best Fit
2 pages
Machine Learning QB
No ratings yet
Machine Learning QB
32 pages
Lab 1
No ratings yet
Lab 1
6 pages
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
No ratings yet
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
16 pages
ML Using Python Unit3 pdf
No ratings yet
ML Using Python Unit3 pdf
8 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Mindmap QUANT - M6
No ratings yet
Mindmap QUANT - M6
1 page
Stata Var - Intro Introduction To Vector Auto Regression Models
No ratings yet
Stata Var - Intro Introduction To Vector Auto Regression Models
7 pages
Econometrics: Domodar N. Gujarati
No ratings yet
Econometrics: Domodar N. Gujarati
29 pages
Bivariate Data
No ratings yet
Bivariate Data
26 pages
Statistics I Fall 2024 Homework Assignment 1_20240909
No ratings yet
Statistics I Fall 2024 Homework Assignment 1_20240909
2 pages
Transfer Intervention
No ratings yet
Transfer Intervention
17 pages
Exercise#8 Instructions Linear Regression Model
No ratings yet
Exercise#8 Instructions Linear Regression Model
4 pages
The F Test or Anova
No ratings yet
The F Test or Anova
5 pages
DID101
No ratings yet
DID101
6 pages
Analysis of The Cost of Traffic Congesti
No ratings yet
Analysis of The Cost of Traffic Congesti
13 pages
Pengaruh Kualitas Produk Dan Harga Terhadap Volume Penjualan Pada Distro Popin Pangkal Pinang
No ratings yet
Pengaruh Kualitas Produk Dan Harga Terhadap Volume Penjualan Pada Distro Popin Pangkal Pinang
20 pages
Correlation and Linear Regression Using Excel: Correlations
No ratings yet
Correlation and Linear Regression Using Excel: Correlations
2 pages
Tutorial 7
No ratings yet
Tutorial 7
4 pages
E - Jurnal Riset Manajemen Fakultas Ekonomi Unisma Website
No ratings yet
E - Jurnal Riset Manajemen Fakultas Ekonomi Unisma Website
19 pages
MCQ Question
No ratings yet
MCQ Question
5 pages
Metrics Assignment
No ratings yet
Metrics Assignment
6 pages
ML LAB Viva Questions with Answers
No ratings yet
ML LAB Viva Questions with Answers
10 pages
Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
1.fitting of Straight Line
No ratings yet
1.fitting of Straight Line
1 page
ECO 204 - Assignment III - Lab - Fall 2023
No ratings yet
ECO 204 - Assignment III - Lab - Fall 2023
2 pages
Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop
No ratings yet
Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop
30 pages
8.0 Lakeland College
No ratings yet
8.0 Lakeland College
2 pages
Econ 271 Reading - Aut09Win10
No ratings yet
Econ 271 Reading - Aut09Win10
3 pages
Fuzzy Means Algorithm
No ratings yet
Fuzzy Means Algorithm
14 pages
Garrison16e PPTch05A
No ratings yet
Garrison16e PPTch05A
21 pages
1859Applied Multilevel Analysis A Practical Guide for Medical Researchers Practical Guides to Biostatistics and Epidemiology 1st Edition Jos W. R. Twisk all chapter instant download
100% (3)
1859Applied Multilevel Analysis A Practical Guide for Medical Researchers Practical Guides to Biostatistics and Epidemiology 1st Edition Jos W. R. Twisk all chapter instant download
82 pages
Financial Econometrics Notes
No ratings yet
Financial Econometrics Notes
115 pages
Mgt555 - Individual Assignment 2
100% (1)
Mgt555 - Individual Assignment 2
6 pages