0% found this document useful (0 votes)

3 views

Linear Regression

models notes

Uploaded by

feathh1

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Linear Regression

models notes

Uploaded by

feathh1

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Linear Models

Characteristics of Linear Models

Linear Models are parametric i.e. they have a fixed form with a small number of
numeric parameters that are to be learnt from data.
Linear Models are stable. It means that small variations in the training data have
only limited impact on the learned model.
Linear Models are less likely to overfit the training data because they have
relatively few parameters.
Linear models have high bias and low variance. (Underfitting)
Linear models are preferable when there is limited training data and overfitting is
to be avoided.
Linear Regression
Linear regression is a method for finding the straight line or hyperplane that best fits a set of points.

It a very simple supervised learning approach.

It is used to predict a quantitative response like price, temperature etc.

It is a widely used statistical learning method.

When there is only feature it is called Univariate Linear Regression

Eg: Predicting prices of house based on the area of the house

and if there are multiple features, it is called Multiple Linear Regression.

Eg: Predicting prices of house based on the area of the house, number of floors, number of rooms
etc.
Advertising Data (Multivariate Regression)
Simple Linear Regression
Simple Linear Regression
It is a simple approach for predicting a quantitative response Y on the basis of a single
predictor variable X.
It assumes that there is approximately a linear relationship between X and Y.
Mathematically, it can be written as:
Y ≈ β 0 + β 1X
β0 and β1are two unknown constants that represent the intercept and slope terms in the
linear model.
Together, β0 and β1are known as model coefficients or parameters (to be learnt).
This is also called as regressing Y on X.
Simple Linear Regression
For this example, it can be written as

sales ≈ β0 + β1* TV

Once the values of β0 and β1have been estimated using the training data, we can
predict future sales on the basis of expenditure done on TV advertising.
Estimating the Coefficients
Let us say, we have a dataset as

(x1,y1), (x2,y2),......(xn,yn)

Now, we want to find an intercept β0 and slope β1 such that the resulting straight
line is as close as possible to the n data points.

Most common approach involves minimising the least squares criterion.

Ways to Estimate the Coefficients
- Ordinary Least Square Method
- Gradient Descent Method
- Normal Equation Method
Ordinary Least Square Method
Least Squares Method
Let ŷi = β0 + β1* xi be the prediction for Y based on the ith value of X.

Then, ei = yi - ŷi represents the ith residual - this is the difference between the ith
observed response value and the ith response value that is predicted by the linear
model.

Residual sum of squares is defined as:

RSS = e12 + e22 + …+en2

Understanding Residual Sum of Squares

X Y Y_predicted_1 Y_predicted_2 (Y-Y1)^2 (Y-Y2)^2

95 85 86 88 1 9

85 95 88 81 49 196

80 70 72 66 4 16

70 65 64 72 1 49

60 70 69 64 1 36

56 306 SUM

RSS of Model 1 is lower than that of Model 2. So Model 1 is preferable over Model 2.
But the question is, how to get to (find out) Model 1 (or the best model)?
Derivation
Now solving for β1
Alternative Formulas
Sample Question
Working
β1 = ((30500) - (5*78*77)) / (31150 - 5*6084)

= (30500 - 30030) / (31150 - 30420)

= (470)/ (730) = 0.6438

β0 = 77 - (-0.6438)(78)

= 77 - 50.2191

= 26.7808

Marks_in_lang_course = 26.7808 + (0.6438)(marks_in_prof_course)

Computing the sum of squared error (SSE)
Making Prediction
If a student scored 80 on the proficiency test, what marks would we expect her to
obtain in the language course?

Set x=80, β1 = 0.6438 and β0 = 26.7808

Predicted marks = 26.7808 + (0.6438)(80)

= 26.7808 + 51.504

= 78.2845
Points to Ponder
The intercept ꞵ0 is such that the regression line goes through

Sum of residuals of the least squares solution is zero.

This property also makes linear regression susceptible to outliers.

Outliers are the points that too far away from the regression line (or most of the
data points), often because of measurement errors.
Multivariate Linear Regression
Given marks in english, marks in mathematics then predict the GATE score

Y (gate_score) [RESPONSE VARIABLE]

x1 (eng_score) and x2 (math_score) [INPUT VARIABLES]

Y = beta0 + beta1 x1 + beta2 x2

Normal Equation Method
Matrix Notation
In order to deal with an arbitrary number of features it will be useful to employ
matrix notation.

Univariate linear regression can be written as:

Matrix Notation
For m examples with n features, this can be written more generally as:

Here, X is a mxn matrix whose first column is all 1s and the remaining columns
are the columns of X and ꞵ hat has the intercept ꞵ0 hat as its first entry and the
regression coefficients as the remaining n entries.
Normal Equation
The ꞵ hat vector can be computed using normal equation as given below:
Characteristics of Normal Equation Method
Normal Equation method is used:

- If n (number of features) is small.

- If m (number of training examples) is small i.e. around 20,000.

One step method for calculating regression coefficients.

Computation increases significantly when number of features increase as the size

of matrix increases and matrix multiplication is a computationally intensive
operation.
Practice Problem
Find the least square regression line for the given dataset using the normal
equation method. Show computation at each step. [2022]

x1 x2 y

1 9 14

2 1 7

3 2 12

4 3 16

5 4 20
Gradient Descent
Gradient Descent
Gradient Descent is a an optimization algorithm that can be used to find the global
or local minima of a differentiable function.

It is an iterative algorithm.
Notations
Matrix Notation

We will set x0(i) = 1, for all values of i.

Loss/Cost Function
A loss/cost function is a function that signifies how much our predicted values is
deviated from the actual values of the dependent variable.
Understanding Cost Function
Training Data

x0 x1 y
Assuming 𝜽0 = 0, find out J(𝜽1)

a) 𝜽1 = 1
1 1 1

1 2 2
b) 𝜽1 = 0.5
1 3 3

c) 𝜽1 = 0
Understanding Cost Function

Source: Andrew Ng, Machine Learning Course, Coursera

Steps Involved in Linear Regression with Gradient Descent Implementation

1. Initialize the weight and bias (i.e. regression coefficients) randomly or with
0(both will work).
2. Make predictions with this initial weight and bias.
3. Compare these predicted values with the actual values and define the loss
function using both these predicted and actual values.
4. With the help of differentiation, calculate how loss function changes with
respect to weight and bias term.
5. Update the weight and bias term so as to minimize the loss function.

To update 𝜽s, we need to calculate the gradients for each 𝜽i

Source: Andrew Ng, Machine Learning Course, Coursera
Differentiating J(𝜽0,𝜽1) w.r.t to 𝜽0 and 𝜽1
Differentiating J(𝜽0,𝜽1) w.r.t to 𝜽0 and 𝜽1

Cancel out 2 from numerator and denominator

Now differentiating w.r.t. 𝜽1

Cancelling out 2 from numerator and denominator and taking x1 common

Calculating Gradients
Updating weights

More generally, it can be written as:

Visualizing Gradient Descent Algorithm

The gradient basically

represents the slope of the
line.
When y increases with x, the
line has a +ve slope, thus w is
decreased.

When y decreases with x, the

line has a -ve slope thus w is
increased.
In both scenarios, the w
moves towards the minimum.
What is alpha?
Alpha represents the learning rate i.e. the speed at which the algorithm moves
towards the minimum point.

Learning rate alpha is something that we have to manually choose and it is

something which we don’t know beforehand. Mostly value of 0.01 is chosen.

A too low value of alpha can make the algorithm move very very slow. The
algorithm is said to converge too slowly.

A too high value of alpha can make the algorithm overshoot the minimum point
and thus never reach the minimum point.
Source: Andrew Ng, Machine Learning Course, Coursera
Source: Andrew Ng, Machine Learning Course, Coursera
Find out the values of regression coefficients for next
two iterations of Gradient Descent. Take initial values
of coefficients as 0. Also, find out the cost at each
iteration. Take alpha = 0.01.
x y

1 0.85

2 1.20

3 1.55

4 1.9
Iteration 1
Iteration 1:updating theta_1
Calculating Cost
J(0.01375,0.03875) = ?

~0.86 (approx.)
Iteration 2
Iteration 2
Calculating Cost
J(0.026393,0.4225) = ?

0.046
Vectorised Notation for Gradient Descent
Solving previous example using vectorized notation
Initially:

Updating theta by placing these values in the equation given below:

First Iteration
Second Iteration
Polynomial Regression
Polynomial Regression
Our hypothesis function need not be linear (a straight line) if that does not fit the
data well.
We can change the behavior or curve of our hypothesis function by making it a
quadratic, cubic or square root function (or any other form).
For example, if our hypothesis function is
then we can create additional features based on x1, to get the quadratic function

or the cubic function

In the cubic version, we have created new features x2 and x3 where x2 = x12 and x3
= x 13

GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Regression
No ratings yet
Regression
16 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
CSE_412__Lab_Manual_3___Linear_Regression
No ratings yet
CSE_412__Lab_Manual_3___Linear_Regression
10 pages
2.1 Linear Regression
No ratings yet
2.1 Linear Regression
39 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Updating_Weight
No ratings yet
Updating_Weight
9 pages
ML L6 Linear Regresion
No ratings yet
ML L6 Linear Regresion
54 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
2-Linear Regression
No ratings yet
2-Linear Regression
31 pages
Lecture 2-Linear-Regression-Part1
No ratings yet
Lecture 2-Linear-Regression-Part1
80 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Regression and Optimization in ML
No ratings yet
Regression and Optimization in ML
41 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Lec3
No ratings yet
Lec3
20 pages
w3 - Linear Model - Linear Regression
No ratings yet
w3 - Linear Model - Linear Regression
33 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
linear regression
No ratings yet
linear regression
130 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
eng
No ratings yet
eng
10 pages
Day 1
No ratings yet
Day 1
41 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Cost Function
No ratings yet
Cost Function
17 pages
Least Square Vs Gradient Descent
No ratings yet
Least Square Vs Gradient Descent
52 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Mla Unit 2
No ratings yet
Mla Unit 2
99 pages
s&Ml Unit 5- q & A
No ratings yet
s&Ml Unit 5- q & A
15 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Unit No. 2
No ratings yet
Unit No. 2
30 pages
Regression PPT
No ratings yet
Regression PPT
21 pages
Notes Chapter 2
No ratings yet
Notes Chapter 2
19 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Exercises 06 Noise and Filter
No ratings yet
Exercises 06 Noise and Filter
5 pages
Ai Problem Solving
No ratings yet
Ai Problem Solving
54 pages
Slide Set Data Converters - Background Elements
100% (1)
Slide Set Data Converters - Background Elements
59 pages
Part 1: What Makes Two Pieces of Code "The Same"?
No ratings yet
Part 1: What Makes Two Pieces of Code "The Same"?
3 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
X X X X XX Solution: Solve Using Modified Newton's Method The Following System of Non Linear Algebraic Equations
No ratings yet
X X X X XX Solution: Solve Using Modified Newton's Method The Following System of Non Linear Algebraic Equations
5 pages
DSA Hashing
No ratings yet
DSA Hashing
13 pages
Fa22-Bse-051 Ai Assignment # 01
100% (1)
Fa22-Bse-051 Ai Assignment # 01
22 pages
PYTHON PROJECT agya full and final
No ratings yet
PYTHON PROJECT agya full and final
12 pages
A Computational Study With Finite Element Method A
No ratings yet
A Computational Study With Finite Element Method A
21 pages
Machine Learning with R the tidyverse and mlr 1st Edition Hefin I Rhys 2024 scribd download
100% (1)
Machine Learning with R the tidyverse and mlr 1st Edition Hefin I Rhys 2024 scribd download
55 pages
Bsts301p Advanced-Competitive-Coding - I Ss 1.0 75 Bsts301p
No ratings yet
Bsts301p Advanced-Competitive-Coding - I Ss 1.0 75 Bsts301p
2 pages
Introduction To Numerical Analysis II: Solving Linear System
No ratings yet
Introduction To Numerical Analysis II: Solving Linear System
28 pages
Lab Notes: CE 33500, Computational Methods in Civil Engineering
No ratings yet
Lab Notes: CE 33500, Computational Methods in Civil Engineering
10 pages
Agreement in (Message-Passing) Synchronous Systems With Failures - Consensus Algorithm For Crash Failures
No ratings yet
Agreement in (Message-Passing) Synchronous Systems With Failures - Consensus Algorithm For Crash Failures
16 pages
Student Lecture 3 Operation On Sequence
No ratings yet
Student Lecture 3 Operation On Sequence
2 pages
Spam Detection and Filtering
No ratings yet
Spam Detection and Filtering
16 pages
K-Nearest Neighbors Clearly Explained
No ratings yet
K-Nearest Neighbors Clearly Explained
11 pages
Seat Summary
No ratings yet
Seat Summary
2 pages
A PLD Based Delta-Sigma DAC: Input A - Straight Binary Format Analog Output (V) Dec Binary
No ratings yet
A PLD Based Delta-Sigma DAC: Input A - Straight Binary Format Analog Output (V) Dec Binary
5 pages
TIPL4705 - DAC Output Response
No ratings yet
TIPL4705 - DAC Output Response
23 pages
A Chaotic Gradient-Based Optimization With Support Vector Machine For Chinese Folk Music Classification
No ratings yet
A Chaotic Gradient-Based Optimization With Support Vector Machine For Chinese Folk Music Classification
4 pages
mml assignment
No ratings yet
mml assignment
3 pages
Paper - 1 - HL Questions
No ratings yet
Paper - 1 - HL Questions
3 pages
Write A Program To Find The Roots of Non-Linear Equation Using Newton-Raphson Method
No ratings yet
Write A Program To Find The Roots of Non-Linear Equation Using Newton-Raphson Method
20 pages
Lecture 11a- Introduction to Graphs
No ratings yet
Lecture 11a- Introduction to Graphs
17 pages
MATH-351 - Numerical Methods
No ratings yet
MATH-351 - Numerical Methods
3 pages
Notes Lagrange
No ratings yet
Notes Lagrange
8 pages
Slope Overload Distortion
50% (6)
Slope Overload Distortion
5 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
28 pages