0% found this document useful (0 votes)

14 views

Linear Regression

Linear regression is a supervised machine learning algorithm that models the relationship between a dependent variable and one or more independent variables using a linear equation. It aims to minimize prediction errors through methods like Ordinary Least Squares (OLS) and is evaluated using metrics such as Mean Squared Error (MSE) and R-squared (R2). Despite its simplicity and broad applicability across various domains, it relies on several assumptions that, if violated, can lead to inaccurate predictions.

Uploaded by

sh.t.tigranyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Linear Regression

Uploaded by

sh.t.tigranyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Linear regression is a supervised machine learning algorithm that models the relationship

between a dependent (target) variable and one or more independent (predictor) variables by
fitting a linear equation to observed data. It is one of the most fundamental and widely used
statistical techniques for predictive analysis and understanding data relationships.

Conceptual Overview

The core idea behind linear regression is straightforward: it assumes that the relationship
between the independent variable(s) and the dependent variable is linear in nature. In the
simplest scenario, known as Simple Linear Regression, the relationship is modeled
between a single predictor and the target variable. When multiple predictors are involved, it
becomes a Multiple Linear Regression problem.

Linear regression attempts to discover the best-fitting straight line (or hyperplane, in multiple
regression) that summarizes the observed data points. This best-fitting line is chosen such
that the overall prediction errors (typically measured as squared differences between
observed and predicted values) are minimized. The most commonly used method for
minimizing this error is called the Ordinary Least Squares (OLS) method.

Mathematical Formulation

For simple linear regression, the mathematical formulation is:

y=β0+β1x+εy = \beta_0 + \beta_1 x + \varepsilony=β0+β1x+ε

Where:

● yyy is the dependent (response) variable.

● xxx is the independent (predictor) variable.
● β0\beta_0β0is the intercept (the predicted value of yyy when x=0x = 0x=0).
● β1\beta_1β1is the slope of the line (the expected change in yyy per unit increase in
xxx).
● ε\varepsilonε is the error term representing noise or variability not captured by the
model.

In multiple linear regression, the equation generalizes to:

y=β0+β1x1+β2x2+⋯+βnxn+εy

Here, multiple independent variables (x1,x2,…,xnx_1, x_2, \dots, x_nx1,x2,…,xn) influence

the dependent variable yyy, each with its corresponding coefficient (β1,β2,…,βn\beta_1,
\beta_2, \dots, \beta_nβ1,β2,…,βn).

The Ordinary Least Squares Method (OLS)

The goal of linear regression is to estimate parameters β0,β1,…,βn\beta_0, \beta_1, \dots,

\beta_nβ0,β1,…,βnthat minimize the residual sum of squares (RSS):
RSS=∑i=1m(yi−y^i)2RSS = \sum_{i=1}^{m}(y_i - \hat{y}_i)^2RSS=i=1∑m(yi−y^i)2

where:

● yiy_iyiis the actual value for the iii-th observation.

● y^i\hat{y}_iy^iis the predicted value for the iii-th observation based on the regression
model.
● mmm is the total number of observations.

Using calculus (taking derivatives of RSS with respect to each parameter and setting them to
zero), analytical solutions for the coefficients are obtained. Specifically, for simple linear
regression:

β1=∑i=1m(xi−xˉ)(yi−yˉ)∑i=1m(xi−xˉ)2,β0=yˉ−β1xˉ\beta_1 = \frac{\sum_{i=1}^{m}(x_i -
\bar{x})(y_i - \bar{y})}{\sum_{i=1}^{m}(x_i - \bar{x})^2}, \quad \beta_0 = \bar{y} -
\beta_1\bar{x}β1=∑i=1m(xi−xˉ)2∑i=1m(xi−xˉ)(yi−yˉ),β0=yˉ−β1xˉ

Where xˉ\bar{x}xˉ and yˉ\bar{y}yˉrepresent the mean of xxx and yyy, respectively.

In multiple linear regression, coefficients are typically calculated using matrix algebra:

β=(XTX)−1XTy\boldsymbol{\beta} = (X^T X)^{-1} X^T yβ=(XTX)−1XTy

Here, XXX is a matrix representing the input data, yyy is a vector representing the observed
outputs, and β\boldsymbol{\beta}β is a vector of regression coefficients.

Evaluation Metrics

To assess the quality and accuracy of the linear regression model, several evaluation
metrics are commonly used:

1. Mean Squared Error (MSE):

MSE=1m∑i=1m(yi−y^i)2MSE = \frac{1}{m}\sum_{i=1}^{m}(y_i -
\hat{y}_i)^2MSE=m1i=1∑m(yi−y^i)2

2. Root Mean Squared Error (RMSE):

RMSE=MSERMSE = \sqrt{MSE}RMSE=MSE

3. Coefficient of Determination (R2R^2R2 score):

R2=1−∑i=1m(yi−y^i)2∑i=1m(yi−yˉ)2R^2 = 1 - \frac{\sum_{i=1}^{m}(y_i -
\hat{y}_i)^2}{\sum_{i=1}^{m}(y_i - \bar{y})^2}R2=1−∑i=1m(yi−yˉ)2∑i=1m(yi−y^i)2

● R2R^2R2 represents the proportion of variance in the dependent variable explained

by the independent variables. It ranges between 0 and 1 (though it can be negative if
the model fits worse than a horizontal line), where a higher R2R^2R2 indicates a
better fit.
Assumptions of Linear Regression

Linear regression relies on several assumptions for its results to be valid:

1. Linearity: There must be a linear relationship between the independent and
dependent variables.
2. Independence: Observations are assumed independent of each other.
3. Homoscedasticity: The variance of residuals should be constant across all levels of
the predictors.
4. Normality: Residuals should be approximately normally distributed.
5. No multicollinearity: Independent variables in multiple regression should not be
highly correlated with each other.

Violations of these assumptions might lead to incorrect conclusions or inaccurate

predictions. Diagnostic plots (residual plots, QQ plots) and tests are often used to check
these assumptions.

Extensions and Variations

Linear regression can be extended and adapted to various scenarios and more complex
situations:

● Polynomial Regression: Adds polynomial terms to account for nonlinear

relationships.
● Ridge and Lasso Regression: Introduce regularization to reduce overfitting and
handle multicollinearity.
● Elastic Net: Combines ridge and lasso regularization to balance between their
benefits.
● Generalized Linear Models (GLM): Extends regression to handle response
variables that have different distributions (e.g., binary outcomes with logistic
regression).

Applications

Linear regression has broad applicability in numerous domains:

● Economics: Predicting consumer spending, estimating demand.

● Finance: Forecasting stock prices, investment returns, risk assessment.
● Healthcare: Predicting medical outcomes, dosage-effect relationships.
● Marketing: Sales forecasting, analyzing pricing strategies.
● Social sciences: Understanding relationships between variables like education level
and income.

Linear regression’s simplicity, interpretability, and flexibility make it a fundamental and

valuable technique, despite its limitations when modeling more complex, nonlinear
relationships or datasets with substantial noise.
In summary, linear regression provides an intuitive yet powerful statistical framework for
modeling and analyzing linear relationships between variables, forming a cornerstone
technique of machine learning and statistical modeling.

The Wizard of Us by Jean Houston - Excerpt
50% (4)
The Wizard of Us by Jean Houston - Excerpt
20 pages
18 Gut Brain and Pandas Pans
100% (2)
18 Gut Brain and Pandas Pans
76 pages
CourseUpdates - 4.3
No ratings yet
CourseUpdates - 4.3
8 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Health9 - q1 - Mod7 - Home Gardening As An Environmental Project - v3
100% (6)
Health9 - q1 - Mod7 - Home Gardening As An Environmental Project - v3
25 pages
T5B4 I 3) Haba Peneutralan
No ratings yet
T5B4 I 3) Haba Peneutralan
4 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Linear Regression - 1st draft (1)
No ratings yet
Linear Regression - 1st draft (1)
5 pages
Hanan
No ratings yet
Hanan
9 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
4 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Linear regression-WPS Office
No ratings yet
Linear regression-WPS Office
2 pages
DAUNIT-3
No ratings yet
DAUNIT-3
32 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
Complete Linear Regression Algorithm
No ratings yet
Complete Linear Regression Algorithm
4 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Unit-2
No ratings yet
Unit-2
18 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Linearregressionpl
No ratings yet
Linearregressionpl
9 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Group_1_Practical
No ratings yet
Group_1_Practical
16 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Machine learning note 1
No ratings yet
Machine learning note 1
2 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
BA3-4-5modules
No ratings yet
BA3-4-5modules
258 pages
Data Science
100% (1)
Data Science
14 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
ML unit-2 ppt
No ratings yet
ML unit-2 ppt
34 pages
Regression v33
No ratings yet
Regression v33
81 pages
Matecconf Icpcm2023 01046
No ratings yet
Matecconf Icpcm2023 01046
6 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
U-4_IML
No ratings yet
U-4_IML
17 pages
Chapter_2_Linear and Logistic Regression
No ratings yet
Chapter_2_Linear and Logistic Regression
34 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
1linear Regression
No ratings yet
1linear Regression
12 pages
AI_Lec23
No ratings yet
AI_Lec23
36 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
Linear Regression Model
No ratings yet
Linear Regression Model
18 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
(Parts of Essay)
No ratings yet
(Parts of Essay)
34 pages
Lama Fera Procedure
100% (6)
Lama Fera Procedure
3 pages
Feasibility Study Report
No ratings yet
Feasibility Study Report
8 pages
Biologia Celular
No ratings yet
Biologia Celular
34 pages
Ordinary - Agricultural Science Paper 1 6115-1 - First Proof 14.03.2023
No ratings yet
Ordinary - Agricultural Science Paper 1 6115-1 - First Proof 14.03.2023
24 pages
AlfaLaval EPC41 Unidad de Control
100% (1)
AlfaLaval EPC41 Unidad de Control
28 pages
Nationalism in India (Presentation and Explanation)
No ratings yet
Nationalism in India (Presentation and Explanation)
8 pages
SFDA Guidance For Drafting Risk Management Plans of COVID-19 Vaccines
No ratings yet
SFDA Guidance For Drafting Risk Management Plans of COVID-19 Vaccines
16 pages
Polyglossia: English Department Subject: Sociolinguistics Stage: 4 Year Academic Year 2022-2023
No ratings yet
Polyglossia: English Department Subject: Sociolinguistics Stage: 4 Year Academic Year 2022-2023
6 pages
Incremental Kinematics For Finite Element
No ratings yet
Incremental Kinematics For Finite Element
20 pages
R600 A
No ratings yet
R600 A
1 page
Thi Online 2 Unit 1
No ratings yet
Thi Online 2 Unit 1
6 pages
Blöser 2019
No ratings yet
Blöser 2019
10 pages
PHP QB
No ratings yet
PHP QB
5 pages
PeTa in Perdev
No ratings yet
PeTa in Perdev
5 pages
Agri Science Form 5 Term 1 2020-2021 Handout 5 Week 3 and 4
No ratings yet
Agri Science Form 5 Term 1 2020-2021 Handout 5 Week 3 and 4
13 pages
Mission of The Cloud Centric Ciso Report
No ratings yet
Mission of The Cloud Centric Ciso Report
9 pages
Boliney, Abra
No ratings yet
Boliney, Abra
42 pages
Bacterial Cell Structure
No ratings yet
Bacterial Cell Structure
58 pages
Inclusive Education: Perception, Practice and Implementation Within Malaysia
No ratings yet
Inclusive Education: Perception, Practice and Implementation Within Malaysia
9 pages
Repairing Logitech Driving Force Pro Pedals
No ratings yet
Repairing Logitech Driving Force Pro Pedals
5 pages
EOB Preboard 2
No ratings yet
EOB Preboard 2
4 pages
MSDS Idrolin-K SM
No ratings yet
MSDS Idrolin-K SM
25 pages
VF & Pulseless VT
No ratings yet
VF & Pulseless VT
1 page
MA8551-Algebra and Number Theory
No ratings yet
MA8551-Algebra and Number Theory
13 pages

Linear Regression

Uploaded by

Linear Regression

Uploaded by

Linear regression is a supervised machine learning algorithm that models the relationship

For simple linear regression, the mathematical formulation is:

y=β0+β1x+εy = \beta_0 + \beta_1 x + \varepsilony=β0​+β1​x+ε

●​ yyy is the dependent (response) variable.

In multiple linear regression, the equation generalizes to:

Here, multiple independent variables (x1,x2,…,xnx_1, x_2, \dots, x_nx1​,x2​,…,xn​) influence

The Ordinary Least Squares Method (OLS)

The goal of linear regression is to estimate parameters β0,β1,…,βn\beta_0, \beta_1, \dots,

●​ yiy_iyi​is the actual value for the iii-th observation.

β=(XTX)−1XTy\boldsymbol{\beta} = (X^T X)^{-1} X^T yβ=(XTX)−1XTy

1.​ Mean Squared Error (MSE):

2.​ Root Mean Squared Error (RMSE):

3.​ Coefficient of Determination (R2R^2R2 score):

●​ R2R^2R2 represents the proportion of variance in the dependent variable explained

Linear regression relies on several assumptions for its results to be valid:

Violations of these assumptions might lead to incorrect conclusions or inaccurate

Extensions and Variations

●​ Polynomial Regression: Adds polynomial terms to account for nonlinear

Linear regression has broad applicability in numerous domains:

●​ Economics: Predicting consumer spending, estimating demand.

Linear regression’s simplicity, interpretability, and flexibility make it a fundamental and

You might also like

y=β0+β1x+εy = \beta_0 + \beta_1 x + \varepsilony=β0+β1x+ε

● yyy is the dependent (response) variable.

Here, multiple independent variables (x1,x2,…,xnx_1, x_2, \dots, x_nx1,x2,…,xn) influence

● yiy_iyiis the actual value for the iii-th observation.

1. Mean Squared Error (MSE):

2. Root Mean Squared Error (RMSE):

3. Coefficient of Determination (R2R^2R2 score):

● R2R^2R2 represents the proportion of variance in the dependent variable explained

● Polynomial Regression: Adds polynomial terms to account for nonlinear

● Economics: Predicting consumer spending, estimating demand.