0% found this document useful (0 votes)

3 views9 pages

Linear Regression

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables through a linear equation. The primary goal is to find the best-fitting line that minimizes the sum of squared differences between observed and predicted values, using methods like least squares and gradient descent. While linear regression is simple and computationally efficient, it has limitations such as sensitivity to multicollinearity and assumptions of linearity in relationships.

Uploaded by

ravindrababu.jonnadula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views9 pages

Linear Regression

Uploaded by

ravindrababu.jonnadula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Introduction to Linear Regression

Linear regression is a statistical method that aims to model the relationship between a dependent
variable and one or more independent variables by fitting a linear equation to the observed data.
This widely-used technique serves as a fundamental building block in both statistics and machine
learning, providing valuable insights into the relationships between variables.

Suppose below is the observed data :

Basic Concept:

At its core, linear regression assumes a linear relationship between the dependent variable (BP) and
the independent variable(s) (Age, Years). The relationship is expressed through a linear equation:

BP = β0 + β1(Age)+ β2 (years) + ????

where (BP) is dependent variable,

(Age, years) are independent variable,

β0 is the y-intercept

β1 , β2 are coefficients

ε is the error term.

Objective
The primary goal of linear regression is to determine the best-fitting line, minimizing the sum of
squared differences between the observed and predicted values. In other words, the model aims to
capture the underlying linear relationship between variables while accounting for variability.

Estimation of Coefficients:

The process of finding the optimal coefficients involves using statistical methods such as the least
squares method. The coefficients β0 ,β1 ,...,βn are estimated to create a model that accurately
represents the data. This fitting process provides a mathematical representation of how changes in
the independent variables correlate with changes in the dependent variable.

Advantages of Linear Regression

 Linear regression is a relatively simple algorithm, making it easy to understand and

implement.

 Linear regression is computationally efficient and can handle large datasets effectively. It can
be trained quickly on large datasets, making it suitable for real-time applications.

 Linear regression often serves as a good baseline model for comparison with more complex
machine learning algorithms.

 Linear regression is a well-established algorithm with a rich history and is widely available in
various machine learning libraries and software packages.

Disadvantages of Linear Regression

 Linear regression assumes a linear relationship between the dependent and independent
variables. If the relationship is not linear, the model may not perform well.

 Linear regression is sensitive to multicollinearity, which occurs when there is a high

correlation between independent variables. Multicollinearity can inflate the variance of the
coefficients and lead to unstable model predictions.

 Linear regression assumes that the features are already in a suitable form for the model.
Feature engineering may be required to transform features into a format that can be
effectively used by the model.

 Linear regression is susceptible to both overfitting and underfitting. Overfitting occurs when
the model learns the training data too well and fails to generalize to unseen data.
Underfitting occurs when the model is too simple to capture the underlying relationships in
the data.
Linear Regression and Gradient Descent

The goal of the Linear regression algorithm is to find the best Fit Line equation that can predict the
values based on the independent variables.

In regression set of records are present with X and Y values and these values are used to learn a
function so if you want to predict Y from an unknown X this learned function can be used. In
regression we have to find the value of Y, So, a function is required that predicts continuous Y in the
case of regression given X as independent features.

Best Fit Line

Our primary objective while using linear regression is to locate the best-fit line, which implies that
the error between the predicted and actual values should be kept to a minimum. There will be the
least error in the best-fit line. The best fit line is determined by finding the coefficients ( β 0 and β1)
that minimize the sum of squared differences between the observed values of the dependent
variable and the values predicted by the regression equation. This process is often achieved through
the method of least squares.

The best Fit Line equation provides a straight line that represents the relationship between the
dependent and independent variables. The slope of the line indicates how much the dependent
variable changes for a unit change in the independent variable(s).

Use of Best Fit Line

Prediction: Once the best fit line is established, it can be used to make predictions. Given a value of
the independent variable(s), the equation allows you to estimate the corresponding value of the
dependent variable. This is particularly useful for forecasting and understanding trends in data.

Understanding Relationships: The slope (β1) of the best fit line indicates the strength and direction
of the relationship between the variables. A positive slope suggests a positive correlation, while a
negative slope indicates a negative correlation.

Visual Representation: The best fit line is often plotted on a scatterplot of the data points. It visually
represents the linear trend in the data and helps assess how well the model fits the observed data.

Cost Function of Linear Regression

In regression, the difference between the observed value of the dependent variable(yi) and the
predicted value(predicted) is called the residuals.

εi = ypredicted – yi = Ŷ - yi

where ypredicted = Ŷ = ϴ1 + ϴ2 Xi

The cost function helps to work out the optimal values for B0 and B1, which provides the best fit line
for the data points.

In Linear Regression, generally Mean Squared Error (MSE) cost function is used, which is the average
of squared error that occurred between the ypredicted and yi. This is the specified cost function.

Gradient Descent

A regression model optimizes the gradient descent algorithm to update the coefficients of the line by
reducing the cost function by randomly selecting coefficient values and then iteratively updating the
values to reach the minimum cost function.
To update ϴ1 and ϴ2 values in order to reduce the Cost function (minimizing MSE value) and achieve
the best-fit line the model uses Gradient Descent.

Let's differentiate the cost function(J) with respect to ϴ1 :

Let's differentiate the cost function(J) with respect to ϴ1 :
Now for the new best fit line the new ϴ1 and ϴ2 are :

where, alpha is the Learning rate

MSE vs MAE

MSE (Mean Squared Error)

Mean Squared Error (MSE) is the average squared error between actual and predicted values. The
mean squared error is calculated by -

MSE should be interpreted as an error metric where the closer your value is to 0, the more accurate
your model is. However, MSE is simply the average of the squared errors, meaning the resulting value
will unfortunately not be understood within the context of your model target.
There is no general rule for how to interpret given MSE values. It is an absolute value which is unique
to each dataset and can only be used to say whether the model has become more or less accurate
than a previous run.

MAE (Mean Absolute Error)

MAE is the average of absolute value between predicted and actual values. The mean absolute error
is calculated by -

The Mean Absolute Error (MAE) serves as an indicator of the accuracy of a predictive model. A lower
MAE suggests a more accurate model. However, it's important to note that the interpretation of MAE
is specific to the scale of the target variable being predicted. Unlike some other metrics, MAE is
returned in the same units as the target variable, making its interpretation dataset-dependent.
Choosing Between MSE and MAE in Specific Scenarios

The key difference between squared error and absolute error is that squared error punishes large
errors to a greater extent than absolute error, as the errors are squared instead of just calculating the
difference.

Let's explore situations where Mean Squared Error (MSE) is more suitable than Mean Absolute Error
(MAE), and vice versa:

Use MSE instead of MAE when :

1. MSE penalizes larger errors more heavily due to the squaring of differences. If your project is
particularly concerned about minimizing the impact of large errors and is more tolerant of small
errors, MSE may be more appropriate.

2. When using optimization algorithms to train machine learning models, MSE can offer better
numerical stability in certain cases. The squared term often leads to smoother and more well-
behaved optimization landscapes.

3. MSE amplifies the differences between small and large errors. This can be beneficial when you
want a metric that reflects and magnifies the variations in performance, making it easier to
distinguish between models with subtle differences.

Use MAE instead of MSE when :

1. If your dataset contains outliers and you want the metric to be less influenced by extreme values,
MAE is a better choice at that situation.

2. MAE provides error values in the same units as the target variable, making it more interpretable. If
clear communication of the error in a way that stakeholders can easily understand is crucial, MAE is
often preferred.
YouTube : Ranji raj, code basics

Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Practice Problem 2 - Forecasting Ordroid Devices
No ratings yet
Practice Problem 2 - Forecasting Ordroid Devices
7 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Regression Notes
100% (1)
Regression Notes
20 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
DAUNIT-3
No ratings yet
DAUNIT-3
32 pages
linearregression-190924053948
No ratings yet
linearregression-190924053948
10 pages
Experiment No 7
No ratings yet
Experiment No 7
7 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Unit 2 Regression
No ratings yet
Unit 2 Regression
31 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
Data Science
100% (1)
Data Science
14 pages
Unit III
No ratings yet
Unit III
18 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
3 Da
No ratings yet
3 Da
16 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Unit 2-1
No ratings yet
Unit 2-1
30 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
What Are Linear Models in Machine Learning[1].Docx (Unit3 Ml)
No ratings yet
What Are Linear Models in Machine Learning[1].Docx (Unit3 Ml)
60 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Unit -3_ML_24
No ratings yet
Unit -3_ML_24
41 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
simple linear regression with example problem
No ratings yet
simple linear regression with example problem
12 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
Complete Linear Regression Algorithm
No ratings yet
Complete Linear Regression Algorithm
4 pages
Regression Notes- Part-1
No ratings yet
Regression Notes- Part-1
17 pages
Unit2-Regression NGP
No ratings yet
Unit2-Regression NGP
81 pages
Cp4252 Ml Unit-II
No ratings yet
Cp4252 Ml Unit-II
44 pages
Regression
No ratings yet
Regression
35 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
MOD3_EDA
No ratings yet
MOD3_EDA
16 pages
Hanan
No ratings yet
Hanan
9 pages
Linear regression case study
No ratings yet
Linear regression case study
6 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
No ratings yet
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
199 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Linear Models
No ratings yet
Linear Models
50 pages
UNIT-3NEW
No ratings yet
UNIT-3NEW
34 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
ML UNIT3B (2)
No ratings yet
ML UNIT3B (2)
175 pages
DSI Guide Linear Regression 1677259638
No ratings yet
DSI Guide Linear Regression 1677259638
17 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Moving to Design
No ratings yet
Moving to Design
34 pages
Component Level Design
No ratings yet
Component Level Design
33 pages
Building the Analysis Model
No ratings yet
Building the Analysis Model
32 pages
Architectural Design
No ratings yet
Architectural Design
45 pages
Clustering
No ratings yet
Clustering
18 pages
Knn
No ratings yet
Knn
11 pages
Files
No ratings yet
Files
4 pages
Ch5 - CPU Scheduling
No ratings yet
Ch5 - CPU Scheduling
60 pages
Bio Statistics - Question & Answers
83% (12)
Bio Statistics - Question & Answers
157 pages
5.21. Chemometric Methods Applied To Analytical Data
No ratings yet
5.21. Chemometric Methods Applied To Analytical Data
18 pages
Principles of Experimental Design and Data Analysis
100% (2)
Principles of Experimental Design and Data Analysis
8 pages
Regression Assignment
No ratings yet
Regression Assignment
2 pages
Objectives: - by The End of This Lecture Students Will
No ratings yet
Objectives: - by The End of This Lecture Students Will
14 pages
Economtric 2 Eqution
No ratings yet
Economtric 2 Eqution
64 pages
Statistical Methods For Quality Control: Matt Levy - ISDS 2001
No ratings yet
Statistical Methods For Quality Control: Matt Levy - ISDS 2001
15 pages
Model II
No ratings yet
Model II
29 pages
Regression - Analysis - Aldrin Sunny Antony
No ratings yet
Regression - Analysis - Aldrin Sunny Antony
15 pages
3 types of Backtest
No ratings yet
3 types of Backtest
20 pages
cheatsheet-summary-made
No ratings yet
cheatsheet-summary-made
3 pages
Unit 3 Summative Assessment Review Guide KEY-1
No ratings yet
Unit 3 Summative Assessment Review Guide KEY-1
10 pages
Correlations: Correlations /variables P1 P2 P3 P4 X1 /print Twotail Nosig /statistics Descriptives /missing Pairwise
No ratings yet
Correlations: Correlations /variables P1 P2 P3 P4 X1 /print Twotail Nosig /statistics Descriptives /missing Pairwise
16 pages
Time Series Final Review
No ratings yet
Time Series Final Review
18 pages
Christina & Brahmana (2021)
No ratings yet
Christina & Brahmana (2021)
8 pages
3.A.CH03_Wooldridge_7e PPT_2pp
No ratings yet
3.A.CH03_Wooldridge_7e PPT_2pp
38 pages
DS Final Report
No ratings yet
DS Final Report
5 pages
Ap Stat 1-7 Notes
No ratings yet
Ap Stat 1-7 Notes
12 pages
Full download Innovation and Interdisciplinary Solutions for Underserved Areas First International Conference InterSol 2017 and Sixth Collogue National sur la Recherche en Informatique et ses Applications CNRIA 2017 Dakar Senegal April 11 12 2017 Proceedings 1st Edition Cheikh M. F. Kebe pdf docx
100% (2)
Full download Innovation and Interdisciplinary Solutions for Underserved Areas First International Conference InterSol 2017 and Sixth Collogue National sur la Recherche en Informatique et ses Applications CNRIA 2017 Dakar Senegal April 11 12 2017 Proceedings 1st Edition Cheikh M. F. Kebe pdf docx
52 pages
Operations Management: William J. Stevenson
100% (4)
Operations Management: William J. Stevenson
36 pages
Chapter 3 two variable regression model
No ratings yet
Chapter 3 two variable regression model
7 pages
Root Mean Square Error (RMSE) or Mean Absolute Error (MAE) - Arguments Against Avoiding RMSE in The Literature (University Research Court)
No ratings yet
Root Mean Square Error (RMSE) or Mean Absolute Error (MAE) - Arguments Against Avoiding RMSE in The Literature (University Research Court)
4 pages
Exam SRM Sample Questions 2
No ratings yet
Exam SRM Sample Questions 2
60 pages
Estimation, Filtering and Adaptive Processes
No ratings yet
Estimation, Filtering and Adaptive Processes
135 pages
ISOM 2500 FA21 - Final - Sol - Acc - Ch13 - 16 - II
No ratings yet
ISOM 2500 FA21 - Final - Sol - Acc - Ch13 - 16 - II
15 pages
Kolmogorov Uji Normalitas
No ratings yet
Kolmogorov Uji Normalitas
19 pages
Basics of Supply Chain Management (BSCM) Practice Questions - APICS CPIM
100% (2)
Basics of Supply Chain Management (BSCM) Practice Questions - APICS CPIM
17 pages
Applied Logistic Regression Analysis
No ratings yet
Applied Logistic Regression Analysis
36 pages
OPMC001 - Business Statistics - Both Assignment
No ratings yet
OPMC001 - Business Statistics - Both Assignment
189 pages

Linear Regression

Uploaded by

Linear Regression

Uploaded by

Introduction to Linear Regression

Suppose below is the observed data :

BP = β0 + β1(Age)+ β2 (years) + ????

where (BP) is dependent variable,

(Age, years) are independent variable,

ε is the error term.

Advantages of Linear Regression

 Linear regression is a relatively simple algorithm, making it easy to understand and

Disadvantages of Linear Regression

 Linear regression is sensitive to multicollinearity, which occurs when there is a high

Best Fit Line

Use of Best Fit Line

Cost Function of Linear Regression

Let's differentiate the cost function(J) with respect to ϴ1 :

where, alpha is the Learning rate

MSE (Mean Squared Error)

MAE (Mean Absolute Error)

Use MSE instead of MAE when :

Use MAE instead of MSE when :

You might also like