0% found this document useful (0 votes)
16 views

Notes 04

Uploaded by

HAMXALA KHAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Notes 04

Uploaded by

HAMXALA KHAN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Machine Learning

EE514 – CS535

Linear Regression: Formulation, Solutions,


Polynomial Regression, Gradient Descent
and Regularization

Zubair Khalid

School of Science and Engineering


Lahore University of Management Sciences

https://www.zubairkhalid.org/ee514_2023.html
Outline

- Regression Set-up
- Linear Regression
- Polynomial Regression
- Underfitting/Overfitting
- Regularization
- Gradient Descent Algorithm
Regression
Regression: Quantitative Prediction on a continuous scale
- Given a data sample, predict a numerical value

Example: Linear relationship


x Process or System y
Input Observed
Output

Process or f (x)
x System
y
Input Noise n Observed
Output

Here, PROCESS or SYSTEM refers to any underlying physical or logical


phenomenon which maps our input data to our observed and noisy output data.
Regression
Overview:

x Process or System y
Input Observed
Output

One variable regression: 𝒚 is a scalar

Multi-variable regression: 𝐲പ is a vector


We will cover
Single feature regression: 𝐱 is a scalar

Multiple feature regression: 𝐱പ is a vector


Regression
Examples:

Single Feature:
- Predict score in the course given the number of hours of effort per week.
- Establish the relationship between the monthly e-commerce sales and the advertising costs.
Multiple Feature:
- Studying operational efficiency of machine given sensors (temperature, vibration) data.
- Predicting remaining useful life (RUL) of the battery from charging and discharging information.
- Estimate sales volume given population demographics, GDP indicators, climate data, etc.
- Predict crop yield using remote sensing (satellite images, gravity information).
- Dynamic Pricing or Surge Pricing by ride sharing applications (Uber).
- Rate the condition (fatigue or distraction) of the driver given the video.
- Rate the quality of driving given the data from sensors installed on car or driving patterns.
Regression
Model Formulation and Setup:
True Model:
We assume there is an inherent Process or y
x
but unknown relationship between System
input and output. Input Noise n Observed
Output

Goal:
Given noisy observations, we need to
estimate the unknown functional
relationship as accurately as possible.
True unknown function
Observations

𝐱
Regression
Model Formulation and Setup:
Process or
System
- Single Feature Regression, Example: Input Noise n Observed
Output

𝒚 Training Data
First Data Sample: x (1) , y (1)
Second Data Sample: x ( 2 ) , y ( 2 )

n-th Data Sample: x ( n ) , y ( n )

𝐱
Regression
Model Formulation and Setup:
Observed Output
We have: Input
Process or
System
Noise n Error

Model
Model Output
Linear Regression
Overview:
- Second learning algorithm of the course

- Scalar output is a linear function of the inputs

- Different from KNN: Linear regression adopts a modular approach which we will use
most of the times in the course.
- Select a model
- Defining a loss function
- Formulate an optimization problem to find the model parameters such that a loss
function is minimized.
- Employ different techniques to solve optimization problem or minimize loss function.
Linear Regression
Model:
Linear Regression
Model:
What is Linear?
𝐨𝐫 𝒚
Interpretation:

𝐨𝐫 𝒚
Linear Regression
Define Loss Function:
- Loss function should be a function of model parameters.

Observed values

𝒚
Residual error

True unknown function:

𝐱
Linear Regression
Define Loss Function:

- One minimizer for all loss functions.


Linear Regression
Define Loss Function:

How to solve?
Linear Regression
Define Loss Function:
Reformulation:

Model
Residual error Parameters

Consequently: Observations Inputs


Linear Regression
Solve Optimization Problem: (Analytical Solution employing Calculus)

- Very beautiful, elegant function we have here!


Linear Regression
Solve Optimization Problem: (Analytical Solution employing Calculus)
Gradient of a function: Overview

Examples:
Linear Regression
Solve Optimization Problem: (Analytical Solution employing Calculus)
Linear Regression
So far and moving forward:
- We assumed that we know the structure of the model, that is, there is a linear

relationship between inputs and output.

- Number of parameters = dimension of the feature space + 1 (bias parameter)

- Formulated loss function using residual error.

- Formulated optimization problem and obtain analytical solution.

- Linear regression is one of the models for which we can obtain an analytical solution.

- We will shortly learn an algorithm to solve optimization problem numerically.


Outline

- Regression Set-up
- Linear Regression
- Polynomial Regression
- Underfitting/Overfitting
- Regularization
- Gradient Descent Algorithm
Polynomial Regression
Overview:
𝒚

- If the relationship between the inputs and output is not linear,


Is it linear ?

we can use a polynomial to model the relationship.

- We will formulate the polynomial regression model for single

feature regression problem.

- Polynomial Regression is often termed as Non-linear


𝐱
Regression or Linear in Parameter Regression.

- We will also revisit the concept of ‘over-fitting’.


Polynomial Regression
Single Feature Regression:
Formulation:
Polynomial Regression
Single Feature Regression:
Formulation:

We have seen
this before.
&
We are capable
to solve this!
Polynomial Regression
Single Feature Regression:
Example (Ref: CB. Section 1.1):
Polynomial Regression
Single Feature Regression:
Example:

Underfitting:
Model is too
simple

Overfitting:
Model is too
complex
Polynomial Regression
Single Feature Regression:
Example:
Overfitting

Good choice
of M

Solution 1:
Polynomial Regression
Single Feature Regression:
Example:
Polynomial Regression
Single Feature Regression:
How to Handle Overfitting?
- The polynomial degree M is the hyper-parameter of our model, like we had k in kNN,
and controls the complexity of the model.
- If we stick with M=3 model, this is the restriction on the number of parameters.
- We encounter overfitting for M=9 because we do not have sufficient data.
Solution 2: Take more data points to avoid over-fitting.

Solution 3: Regularization
Outline

- Regression Set-up
- Linear Regression
- Polynomial Regression
- Underfitting/Overfitting
- Regularization
- Gradient Descent Algorithm
Regularization
Regularization overview:
- The concept is broad but we will see in the context of linear regression or polynomial
regression which we formulated as linear regression.
- Encourages the model coefficients to be small by adding a penalty term to the error.
- We had the loss function of the following form that we minimize to find the coefficients:

See linear regression


formulation.

- We add a ‘penalty term’, known as regularizer, in the loss function as

Regularized Loss function Regularizer


Regularization
L2 Least-squares Regularization – Ridge Regression:
- Since we require to discourage the model coefficients from reaching large values; we can
use the following simple regularizer:

- For this choice, regularized loss function becomes

- This regularization term maintains a trade-off between ‘fit of the model to the data’
and ‘square of norm of the coefficients’.
- If model is fitted poorly, the first term is large.
- If coefficients have high values, the second term (penalty term) is large.

Intuitive Interpretation: We want to minimize the error while


keeping the norm of the coefficients bounded.
Regularization
L2 Least-squares Regularization – Ridge Regression:
- Regularized loss function is still quadratic, and we can find closed form solution.
Regularization
L2 Least-squares Regularization – Ridge Regression:
Example:

No regularization Too much regularization


Regularization
L2 Least-squares Regularization – Ridge Regression:
Example:
Regularization
L2 Least-squares Regularization – Ridge Regression:
Graphical Visualization:
Regularization
L1 Least-squares Regularization – Lasso Regression
Graphical Visualization:
Regularization
Elastic Net Regression, L1 vs L2
Outline

- Regression Set-up
- Linear Regression
- Polynomial Regression
- Underfitting/Overfitting
- Regularization
- Gradient Descent Algorithm
Gradient Descent Algorithm
Optimization and Gradient Descent - Overview
Gradient Descent Algorithm
Optimization and Gradient Descent - Overview
Gradient Descent Algorithm
Formulation:
Gradient Descent Algorithm
Algorithm:
Overall:

Pseudo-code:

Note: Simultaneous update.

Convergence and Step size:


Gradient Descent Algorithm
Linear Regression Case:

Gradient Descent:

Note:
Simultaneous update.
Gradient Descent Algorithm
Linear Regression Case:
Visualization:

Surface plot Contour plot


Gradient Descent Algorithm
Linear Regression Case:

Gradient Descent:

Note:
Simultaneous update.
Gradient Descent Algorithm
Notes:

Why?

Stochastic Gradient Descent:


Gradient Descent Algorithm
Stochastic Gradient Descent (SGD) - Rationale:
Gradient Descent Algorithm
Stochastic Gradient Descent (SGD):

Pros:
Gradient Descent Algorithm
SGD for Linear Regression Case:

Iteration Epoch
Gradient Descent Algorithm
Mini-batch Stochastic Gradient Descent (SGD) :
Batch Gradient Descent Stochastic Gradient Descent

You might also like