0% found this document useful (0 votes)
71 views

01-Linear Regression

datascience

Uploaded by

hosienzade45
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

01-Linear Regression

datascience

Uploaded by

hosienzade45
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Machine Learning (CE 40477)


Fall 2024

Ali Sharifi-Zarchi

CE Department
Sharif University of Technology

September 21, 2024

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 1 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 2 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 3 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Introduction to Machine Learning

Course Overview
• Supervised Learning
• Unsupervised Learning
• Neural Networks
• Computer Vision
• Natural Language Processing
• Contrastive Language-Image Pretraining

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 4 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Supervised Learning

Learn to predict outcomes using labeled data.


• Real estate agencies predict house prices using historical data.
• Features like size, location, and number of rooms are used.

Learn More: Predicting House Prices with Linear Regression


CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 5 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Unsupervised Learning

• Find patterns in unlabeled data

Example: Customer segmentation, news clustering.


CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 6 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Neural Networks

• Mimic the human brain to solve complex tasks

Example: Facial recognition, voice assistants.


CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 7 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Computer Vision

• Enable machines to see and interpret images

Example: Factory quality control, medical image analysis.


CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 8 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Natural Language Processing (NLP)

• Understand and generate human language

Example: Chat-bots, language translation.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 9 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Contrastive Language-Image Pretraining (CLIP)

• Connecting text and images.

Figure adapted from [6]


CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 10 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 11 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Definition of Machine Learning (ML)

• Machine Learning: A field of study that enables computers to learn from data
without being explicitly programmed.
• Involves constructing algorithms that generalize patterns from data.
• Focuses on predicting outcomes, classification, or uncovering hidden structures.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 12 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Tom M. Mitchell’s Definition of Machine Learning

• A well-known definition of machine learning comes from Tom M. Mitchell:


"A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E."

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 13 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Definition of Machine Learning (cont.)

• Goal: Develop models that make accurate predictions based on past data.
• Formal definition: Given a task T , performance measure P, and experience E:

Learning Problem = (T , P, E)

• Example: Predicting house prices based on previous data.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 14 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Example Usage of ML

• Real-world examples:
• Fake News Detection
(classification problem)
• Predicting House Prices
(regression problem)
• Self-driving cars (real-time
decision making)
• Application domains: finance,
healthcare, robotics, etc.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 15 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Example Usage of ML: Home Price

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 16 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Example Usage of ML: Applicant approval

• Applicant form as the input.


• Output: approving or denying the
request.

Figure adapted from slides of Dr. Soleymani, Machine Learning course, Sharif University of technology.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 17 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Paradigms of ML

• Supervised learning (regression, classification)


• predicting a target variable for which we get to see examples.
• Unsupervised learning
• revealing structure in the observed data
• Reinforcement learning
• partial (indirect) feedback, no explicit guidance
• given rewards for a sequence of moves to learn a policy and utility functions
• Other paradigms: semi-supervised learning, active learning, online learning, etc.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 18 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning
Linear regression
Analytical solution

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 19 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Supervised Learning

• Definition: A form of machine learning where the model learns from labeled data
{(xi , yi )} to predict an output y given an input x.
• Goal: Estimate a function f : RD → R, such that y = f (x) + ϵ, where ϵ is noise.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 20 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Components of (Supervised) Learning

• Unknown target function: f : X → Y


• Input space: X
• Output space: Y

• Training data: (x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )

• Pick a formula g : X → Y that approximates the target function f


• selected from a set of hypotheses H

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 21 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Components of (Supervised) Learning (cont.)

Figure adapted from Abu-Mostafa, Yaser S., Malik Magdon-Ismail, and Hsuan-Tien Lin. Learning from Data: A Short Course
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 22 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Supervised Learning: Regression vs. Classification

• Regression: predict a continuous target variable


• E.g., y ∈ [0, 1]
• Classification: predict a discrete target variable
• E.g., y ∈ {1, 2, . . . , C}

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 23 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning
Linear regression
Analytical solution

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 24 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Solution Components - Learning Model

• The Learning Model consists of:


• Hypothesis Set: Defines the possible functions H = {h(x, θ)|θ ∈ Θ}, where h(x, θ)
represents candidate functions and θ is the learning parameters of problem.
• Learning Algorithm: Find θ ∗ ∈ Θ such that h(x, θ ∗ ) ≈ f (x).
• Both work together to map inputs x to outputs y with minimized error.
• In other words, θ ∗ is best parameters to predict outputs using chosen hypothesis.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 25 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Hypothesis Space Overview

• Hypothesis (h): A mapping from input space X to output space Y .


• Linear Regression Hypothesis:

hw (x) = w0 + w1 x1 + · · · + wD xD = w⊤ x

• Input Vector x:
£ ¤
x = x0 = 1, x1 , x2 , . . . , xD
• Parameter Vector w:
£ ¤
w = w0 , w1 , w2 , . . . , wD

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 26 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Linear Hypothesis Representation

• Linear Hypothesis Space:


• Simplest form: Linear combination of input features.

D
X
hw (x) = w0 + wi xi
i=1

• Linear Hypothesis Examples:


• Single Variable: hw (x) = w0 + w1 x
• Multivariate: hw (x) = w0 + w1 x1 + w2 x2 + ... + wD xD

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 27 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Understanding Cost Functions

• In hypothesis space, we select a function h(x; w) to approximate the true


relationship between input x and output y.
• The objective is to minimize the difference between predicted values h(x) and
actual values y.
• This difference is quantified using cost functions, which guide us in choosing the
optimal hypothesis.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 28 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

What is a Cost Function?

• A cost function measures how well the hypothesis h(x; w) fits the training data.
• In regression problems, the most common error function is the Squared Error (SE):
³ ´2
SE : y (i) − h(x(i) ; w)

• Cost function should measure all predictions. Thus a choice could be Sum of
Squared Errors (SSE):
N ³ ´2
y (i) − h(x(i) ; w)
X
J(w) =
i=1
• Objective: Minimize the cost function to find the best parameters w.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 29 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

SSE: Sum of Squared Errors

• SSE is widely used due to its simplicity and differentiability.


• Intuitively, it represents the squared distance between predicted and true values.
• Penalizes larger errors more severely than smaller ones (due to the square).
• For linear regression, it can be written as:

N ³ ´2
y (i) − wT x(i)
X
SSE =
i=1

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 30 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

How to measure the error

n ³ ´2
y (i) − hw (x(i) )
X
J(w) =
i=1
n ³ ´2
y (i) − w0 − w1 x(i)
X
=
i=1

Figure adapted from slides of Dr. Soleymani, Machine Learning course, Sharif university of technology.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 31 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning
Linear regression
Analytical solution

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 32 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

The learning algorithm

• Objective: Choose w so as to minimize the J(w)


• The learning algorithm: optimization of the cost function
• Explicitly taking the cost function derivative with respect to the wi ’s, and setting them
to zero.
• Parameters of the best hypothesis for the training set:

w∗ = arg min J(w)


w

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 33 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Cost function optimization: univariate

n ³ ´2
y (i) − w0 − w1 x(i)
X
J(w) =
i=1

• Necessary conditions for the “optimal” parameter values:

∂J(w)
=0
∂w0

∂J(w)
=0
∂w1

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 34 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Optimality conditions: univariate

n ³ ´2
y (i) − w0 − w1 x(i)
X
J(w) =
i=1

∂J(w) Xn ³ ´
= 2 y (i) − w0 − w1 x(i) (−x(i) ) = 0
∂w1 i=1

∂J(w) Xn ³ ´
= 2 y (i) − w0 − w1 x(i) (−1) = 0
∂w0 i=1

• A system of 2 linear equations

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 35 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Linear regression example: TV Advertising and Sales

This is a real-world example of how businesses can use linear regression to make
decisions about marketing budgets. The following table shows the amount of TV
advertising budget spend and respective average sales of houses in Boston:

TV Advertising Spend ($1000) Sales (Units)


230.1 22.1
44.5 10.4
17.2 9.3
151.5 18.5
180.8 12.9
Table 1: TV Advertising Spend vs. Sales Dataset

Learn More: Advertising Sales Dataset

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 36 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Linear regression example: TV Advertising and Sales (cont.)

• In this problem, Sales per TV advertising is need thus the amount spend is
considered as input x and sales is considered as output y.
• Using linear regression, cost function can be written as:

5 ³ ´2
y (i) − w0 − w1 x(i)
X
J(w) =
i=1

• Applying necessary conditions for the optimal parameters:

∂J(w) X5 ³ ´
= 2 y (i) − w0 − w1 x(i) (−x(i) ) = 0 =⇒ 110863w1 + 624.1w0 − 10843.04 = 0
∂w1 i=1

∂J(w) X5 ³ ´
= 2 y (i) − w0 − w1 x(i) (−1) = 0 =⇒ 624.1w1 + 5w0 − 73.2 = 0
∂w0 i=1

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 37 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Linear regression example: TV Advertising and Sales (cont.)

• Solving the system of two


equations described, we get:

w1 ≈ 0.052

w0 ≈ 8.18

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 38 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Cost function optimization: multivariate

• Remembering SSE cost function of multivariate linear regression

n ³ ´2 n ³ ´2
y (i) − hw (x(i) ) y (i) − wT x(i)
X X
J(w) = =
i=1 i=1

• We usually write matrix form of the problem as follows:

1 x1(1) xd(1)
  
··· w0
y (1)
 
(2)
1 x1 ··· xd(2)   w1 
   
 . 
X = .
 .. .. ..  w=
 .. 
 y =  .. 
 .. . . .   . 

(n) y (n)
1 x1 ··· xd(n) wd

(i)
In which xm indicates m’th feature of data point i

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 39 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Cost function optimization: multivariate

• Using the matrix forms suggested, we can rewrite cost function:

J(w) = ∥y − Xw∥22

• Explicitly taking the cost function derivative with respect to the w, and setting them
to zero:
∇w J(w) = −2XT y − Xw
¡ ¢

¢−1
∇w J(w) = 0 =⇒ XT Xw = XT y =⇒ w = XT X XT y
¡

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 40 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Analytical solution: multivariate

• XT X −1 XT is called pseudo-inverse of matrix X


¡ ¢

• The matrix X is often not square yet not invertible.


• The pseudo-inverse can be computed for any matrix, regardless of its shape.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 41 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Computational limitations of analytical solution

• Scalability: Analytical solutions do not scale well with very large datasets, making
them impractical for big data applications.
• Finding the inverse of a matrix:
• Simplest way of finding inverse: Gaussian elimination, having complexity of O(n3 ).
• Other methods: LU decomposition, which also has a complexity of O(n3 ) but is more
stable.
• Numerical methods: Iterative methods like Conjugate Gradient, which can be more
efficient for large, sparse matrices.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 42 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Practical limitations of analytical solution

• Online learning: Data observations are arriving in a continuous stream


• Predictions must be made before seeing all of the data.
• Analytical solutions require all data to be available upfront, which is not feasible in
online learning scenarios.
• Analytical methods are not adaptable to new data without re-computing the entire
solution.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 43 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Cost function optimization

• Another approach,
• Start from an initial guess and iteratively change w to minimize J(w).
• The gradient descent algorithm
• Steps:
• Start from w0
• Repeat:
• Update wt to wt+1 in order to reduce J
• t ← t +1
until we hopefully end up at a minimum.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 44 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization
Gradient descent
Stochastic gradient descent

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 45 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization
Gradient descent
Stochastic gradient descent

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 46 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient Descent

• In each step, takes steps proportional to the negative of the gradient vector of the
function at the current point wt :

wt+1 = wt − η∇J(wt )
• J(w) decreases fastest if one goes from wt in the direction of −∇J(wt )
• Assumption: J(w) is defined and differentiable in a neighborhood of a point wt
• Gradient ascent takes steps proportional to (the positive of ) the gradient to find a
local maximum of the function.
• Continue to find:
w∗ = arg min J(w)
w

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 47 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient Descent (cont.)

• Minimize J(w)

wt+1 = wt − η∇w J(wt )

 ∂J(w) 
 ∂w. 1 
 .. 
∇w J(w) =  
∂J(w)
∂wd

• If η is small enough, then J(wt+1 ) ≤ J(wt ).


• η can be allowed to change at every iteration as η t .

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 48 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent disadvantages

• Local minima problem


• However, when J is convex, all local minima are also global minima ⇒ gradient
descent can converge to the global solution.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 49 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Problem of gradient descent with non-convex cost functions

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 50 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Cost function optimization

• Weight update rule having hw (x) = wT x is as follows:

n ³ ´
wt+1 = wt + η y (i) − wT x(i) x(i)
X
i=1

• η: too small ⇒ gradient descent can be slow.


• η: too large ⇒ gradient descent can overshoot the minimum. It may fail to
converge, or even diverge.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 51 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 52 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 53 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 54 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 55 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 56 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 57 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 58 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 59 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 60 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization
Gradient descent
Stochastic gradient descent

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 61 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Variants of gradient descent

• Batch gradient descent: processes the entire training set in one iteration
• It can be computationally costly for large data sets and practically impossible for
some applications (such as online learning).
• Mini-batch gradient descent: processes small, random subsets (mini-batches) of
the training set in each iteration
• Balances the efficiency of batch gradient descent.
• Stochastic gradient descent: processes one training example per iteration
• Updates the model parameters more frequently, which can lead to faster convergence.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 62 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Stochastic gradient descent

• Example: Linear regression with SSE cost function


n
J(w) = J (i) (w)
X
i=1
³ ´2
J (i) (w) = y (i) − wT x(i)

wt+1 = wt − η∇w J (i) (w)

³ ´
=⇒ wt+1 = wt + η y (i) − wT x(i) x(i)

In which x(i) indicates the i’th observation arrived.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 63 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Stochastic gradient descent: online learning

• Often, stochastic gradient descent gets close to the minimum much faster than
batch gradient descent.
• Note however that it may never converge to the minimum, and the parameters will
keep oscillating around the minimum of the cost function;
• In practice, most of the values near the minimum will be reasonably good
approximations to the true minimum.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 64 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 65 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Limitations of linear regression

• It is possible that the best fitted line is still far off the real pattern of samples:

Figure 1: No line can be fitted to generalize on these samples


CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 66 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Beyond Linear Regression

• How can we extend linear regression to model non-linear relationships?


• Transforming Data Using Basis Functions:
• Basis functions allow us to transform the original features into a new feature space.
• Common basis functions include polynomial and Gaussian functions.
• Learning a Linear Regression on Transformed Features:
• By applying linear regression to the transformed feature vectors, we can model complex,
non-linear relationships.
• This approach maintains the simplicity and interpretability of linear regression while
extending its flexibility.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 67 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Beyond Linear Regression: Polynomial Regression

Figure 2: Best fitted polynomial of degree 4 can generalize well on samples


CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 68 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Beyond Linear Regression: change of basis

Figure 3: Changing the basis of [1, x, y] to [1, x2 , y 2 ], we can use linear regression

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 69 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Polynomial regression: Univariate

• Polynomial Regression Hypothesis: m’th order regression

h(x; w) = w0 + w1 x1 + · · · + wm−1 xm−1 + wm xm

• Objective: Fit a polynomial of degree m to data points.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 70 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Polynomial Regression: Univariate (cont.)

• Similar to what we did for univariate linear regression, we can define:


1 m
1 x(1) x(1)
 

··· ŵ0
y (1)
 
1 m
1 x(2) ··· x(2)   ŵ1 
  
′  . 
X = .
 .. .. ..  w= . 
 y =  .. 
 .. . . .   .. 
 
1 m y (n)
1 x(n) ··· x(n) ŵm

In which x(i) indicates the i-th data point.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 71 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Polynomial regression analytical solution: Univariate

Rewriting the SSE cost function using matrix form we have:

J(w) = ∥y − X′ w∥22

Analytical solution: It has closed form solution as


³ ´−1
T T
ŵ = X′ X′ X′ y

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 72 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Training and Validation

• To better distinct between linear regression and polynomial regression, we will


show that linear model can not generalize well.
• Assume the samples are split into two subsets. Train dataset which is used to train
a regression model. As well as a Validation dataset to find the best regression
model for an application.
• If a model can generalize well on validation set, it can be a good candidate.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 73 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Polynomial regression: example

• Consider the following noisy samples

Splitting samples to train and validation sets

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 74 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Quadratic regression vs Linear regression

Fitting both quadratic and linear regression shows the power of polynomial regression
on generalizing for more complex patterns.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 75 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 76 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Contributions

• This slide has been prepared thanks to:


• Arshia Gharooni

• Mahan Bayhaghi

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 77 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

[1] C. M., Pattern Recognition and Machine Learning.


Information Science and Statistics, New York, NY: Springer, 1 ed., Aug. 2006.
[2] M. Soleymani Baghshah, “Machine learning.” Lecture slides.
[3] A. Ng and T. Ma, CS229 Lecture Notes.
[4] T. Mitchell, Machine Learning.
McGraw-Hill series in computer science, New York, NY: McGraw-Hill Professional,
Mar. 1997.
[5] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin, Learning From Data: A Short
Course.
New York, NY: AMLBook, 2012.
[6] S. Goel, H. Bansal, S. Bhatia, R. A. Rossi, V. Vinay, and A. Grover, “CyCLIP: Cyclic
Contrastive Language-Image Pretraining,” ArXiv, vol. abs/2205.14459, May 2022.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 78 / 78

You might also like