0% found this document useful (0 votes)

71 views

01-Linear Regression

datascience

Uploaded by

hosienzade45

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views

01-Linear Regression

datascience

Uploaded by

hosienzade45

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Machine Learning (CE 40477)

Fall 2024

Ali Sharifi-Zarchi

CE Department
Sharif University of Technology

September 21, 2024

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 1 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 2 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 3 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Introduction to Machine Learning

Course Overview
• Supervised Learning
• Unsupervised Learning
• Neural Networks
• Computer Vision
• Natural Language Processing
• Contrastive Language-Image Pretraining

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 4 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Supervised Learning

Learn to predict outcomes using labeled data.

• Real estate agencies predict house prices using historical data.
• Features like size, location, and number of rooms are used.

Learn More: Predicting House Prices with Linear Regression

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 5 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Unsupervised Learning

• Find patterns in unlabeled data

Example: Customer segmentation, news clustering.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 6 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Neural Networks

• Mimic the human brain to solve complex tasks

Example: Facial recognition, voice assistants.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 7 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Computer Vision

• Enable machines to see and interpret images

Example: Factory quality control, medical image analysis.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 8 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Natural Language Processing (NLP)

• Understand and generate human language

Example: Chat-bots, language translation.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 9 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Contrastive Language-Image Pretraining (CLIP)

• Connecting text and images.

Figure adapted from [6]

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 10 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 11 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Definition of Machine Learning (ML)

• Machine Learning: A field of study that enables computers to learn from data
without being explicitly programmed.
• Involves constructing algorithms that generalize patterns from data.
• Focuses on predicting outcomes, classification, or uncovering hidden structures.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 12 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Tom M. Mitchell’s Definition of Machine Learning

• A well-known definition of machine learning comes from Tom M. Mitchell:

"A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E."

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 13 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Definition of Machine Learning (cont.)

• Goal: Develop models that make accurate predictions based on past data.
• Formal definition: Given a task T , performance measure P, and experience E:

Learning Problem = (T , P, E)

• Example: Predicting house prices based on previous data.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 14 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Example Usage of ML

• Real-world examples:
• Fake News Detection
(classification problem)
• Predicting House Prices
(regression problem)
• Self-driving cars (real-time
decision making)
• Application domains: finance,
healthcare, robotics, etc.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 15 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Example Usage of ML: Home Price

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 16 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Example Usage of ML: Applicant approval

• Applicant form as the input.

• Output: approving or denying the
request.

Figure adapted from slides of Dr. Soleymani, Machine Learning course, Sharif University of technology.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 17 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Paradigms of ML

• Supervised learning (regression, classification)

• predicting a target variable for which we get to see examples.
• Unsupervised learning
• revealing structure in the observed data
• Reinforcement learning
• partial (indirect) feedback, no explicit guidance
• given rewards for a sequence of moves to learn a policy and utility functions
• Other paradigms: semi-supervised learning, active learning, online learning, etc.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 18 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning
Linear regression
Analytical solution

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 19 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Supervised Learning

• Definition: A form of machine learning where the model learns from labeled data
{(xi , yi )} to predict an output y given an input x.
• Goal: Estimate a function f : RD → R, such that y = f (x) + ϵ, where ϵ is noise.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 20 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Components of (Supervised) Learning

• Unknown target function: f : X → Y

• Input space: X
• Output space: Y

• Training data: (x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )

• Pick a formula g : X → Y that approximates the target function f

• selected from a set of hypotheses H

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 21 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Components of (Supervised) Learning (cont.)

Figure adapted from Abu-Mostafa, Yaser S., Malik Magdon-Ismail, and Hsuan-Tien Lin. Learning from Data: A Short Course
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 22 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Supervised Learning: Regression vs. Classification

• Regression: predict a continuous target variable

• E.g., y ∈ [0, 1]
• Classification: predict a discrete target variable
• E.g., y ∈ {1, 2, . . . , C}

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 23 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning
Linear regression
Analytical solution

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 24 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Solution Components - Learning Model

• The Learning Model consists of:

• Hypothesis Set: Defines the possible functions H = {h(x, θ)|θ ∈ Θ}, where h(x, θ)
represents candidate functions and θ is the learning parameters of problem.
• Learning Algorithm: Find θ ∗ ∈ Θ such that h(x, θ ∗ ) ≈ f (x).
• Both work together to map inputs x to outputs y with minimized error.
• In other words, θ ∗ is best parameters to predict outputs using chosen hypothesis.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 25 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Hypothesis Space Overview

• Hypothesis (h): A mapping from input space X to output space Y .

• Linear Regression Hypothesis:

hw (x) = w0 + w1 x1 + · · · + wD xD = w⊤ x

• Input Vector x:
£ ¤
x = x0 = 1, x1 , x2 , . . . , xD
• Parameter Vector w:
£ ¤
w = w0 , w1 , w2 , . . . , wD

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 26 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Linear Hypothesis Representation

• Linear Hypothesis Space:

• Simplest form: Linear combination of input features.

D
X
hw (x) = w0 + wi xi
i=1

• Linear Hypothesis Examples:

• Single Variable: hw (x) = w0 + w1 x
• Multivariate: hw (x) = w0 + w1 x1 + w2 x2 + ... + wD xD

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 27 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Understanding Cost Functions

• In hypothesis space, we select a function h(x; w) to approximate the true

relationship between input x and output y.
• The objective is to minimize the difference between predicted values h(x) and
actual values y.
• This difference is quantified using cost functions, which guide us in choosing the
optimal hypothesis.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 28 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

What is a Cost Function?

• A cost function measures how well the hypothesis h(x; w) fits the training data.
• In regression problems, the most common error function is the Squared Error (SE):
³ ´2
SE : y (i) − h(x(i) ; w)

• Cost function should measure all predictions. Thus a choice could be Sum of
Squared Errors (SSE):
N ³ ´2
y (i) − h(x(i) ; w)
X
J(w) =
i=1
• Objective: Minimize the cost function to find the best parameters w.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 29 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

SSE: Sum of Squared Errors

• SSE is widely used due to its simplicity and differentiability.

• Intuitively, it represents the squared distance between predicted and true values.
• Penalizes larger errors more severely than smaller ones (due to the square).
• For linear regression, it can be written as:

N ³ ´2
y (i) − wT x(i)
X
SSE =
i=1

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 30 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

How to measure the error

n ³ ´2
y (i) − hw (x(i) )
X
J(w) =
i=1
n ³ ´2
y (i) − w0 − w1 x(i)
X
=
i=1

Figure adapted from slides of Dr. Soleymani, Machine Learning course, Sharif university of technology.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 31 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning
Linear regression
Analytical solution

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 32 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

The learning algorithm

• Objective: Choose w so as to minimize the J(w)

• The learning algorithm: optimization of the cost function
• Explicitly taking the cost function derivative with respect to the wi ’s, and setting them
to zero.
• Parameters of the best hypothesis for the training set:

w∗ = arg min J(w)

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 33 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Cost function optimization: univariate

n ³ ´2
y (i) − w0 − w1 x(i)
X
J(w) =
i=1

• Necessary conditions for the “optimal” parameter values:

∂J(w)
=0
∂w0

∂J(w)
=0
∂w1

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 34 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Optimality conditions: univariate

n ³ ´2
y (i) − w0 − w1 x(i)
X
J(w) =
i=1

∂J(w) Xn ³ ´
= 2 y (i) − w0 − w1 x(i) (−x(i) ) = 0
∂w1 i=1

∂J(w) Xn ³ ´
= 2 y (i) − w0 − w1 x(i) (−1) = 0
∂w0 i=1

• A system of 2 linear equations

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 35 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Linear regression example: TV Advertising and Sales

This is a real-world example of how businesses can use linear regression to make
decisions about marketing budgets. The following table shows the amount of TV
advertising budget spend and respective average sales of houses in Boston:

TV Advertising Spend ($1000) Sales (Units)

230.1 22.1
44.5 10.4
17.2 9.3
151.5 18.5
180.8 12.9
Table 1: TV Advertising Spend vs. Sales Dataset

Learn More: Advertising Sales Dataset

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 36 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Linear regression example: TV Advertising and Sales (cont.)

• In this problem, Sales per TV advertising is need thus the amount spend is
considered as input x and sales is considered as output y.
• Using linear regression, cost function can be written as:

5 ³ ´2
y (i) − w0 − w1 x(i)
X
J(w) =
i=1

• Applying necessary conditions for the optimal parameters:

∂J(w) X5 ³ ´
= 2 y (i) − w0 − w1 x(i) (−x(i) ) = 0 =⇒ 110863w1 + 624.1w0 − 10843.04 = 0
∂w1 i=1

∂J(w) X5 ³ ´
= 2 y (i) − w0 − w1 x(i) (−1) = 0 =⇒ 624.1w1 + 5w0 − 73.2 = 0
∂w0 i=1

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 37 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Linear regression example: TV Advertising and Sales (cont.)

• Solving the system of two

equations described, we get:

w1 ≈ 0.052

w0 ≈ 8.18

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 38 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Cost function optimization: multivariate

• Remembering SSE cost function of multivariate linear regression

n ³ ´2 n ³ ´2
y (i) − hw (x(i) ) y (i) − wT x(i)
X X
J(w) = =
i=1 i=1

• We usually write matrix form of the problem as follows:

1 x1(1) xd(1)
  
··· w0
y (1)
 
(2)
1 x1 ··· xd(2)   w1 
   
 . 
X = .
 .. .. ..  w=
 .. 
 y =  .. 
 .. . . .   . 

(n) y (n)
1 x1 ··· xd(n) wd

(i)
In which xm indicates m’th feature of data point i

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 39 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Cost function optimization: multivariate

• Using the matrix forms suggested, we can rewrite cost function:

J(w) = ∥y − Xw∥22

• Explicitly taking the cost function derivative with respect to the w, and setting them
to zero:
∇w J(w) = −2XT y − Xw
¡ ¢

¢−1
∇w J(w) = 0 =⇒ XT Xw = XT y =⇒ w = XT X XT y
¡

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 40 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Analytical solution: multivariate

• XT X −1 XT is called pseudo-inverse of matrix X

¡ ¢

• The matrix X is often not square yet not invertible.

• The pseudo-inverse can be computed for any matrix, regardless of its shape.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 41 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Computational limitations of analytical solution

• Scalability: Analytical solutions do not scale well with very large datasets, making
them impractical for big data applications.
• Finding the inverse of a matrix:
• Simplest way of finding inverse: Gaussian elimination, having complexity of O(n3 ).
• Other methods: LU decomposition, which also has a complexity of O(n3 ) but is more
stable.
• Numerical methods: Iterative methods like Conjugate Gradient, which can be more
efficient for large, sparse matrices.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 42 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Practical limitations of analytical solution

• Online learning: Data observations are arriving in a continuous stream

• Predictions must be made before seeing all of the data.
• Analytical solutions require all data to be available upfront, which is not feasible in
online learning scenarios.
• Analytical methods are not adaptable to new data without re-computing the entire
solution.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 43 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Cost function optimization

• Another approach,
• Start from an initial guess and iteratively change w to minimize J(w).
• The gradient descent algorithm
• Steps:
• Start from w0
• Repeat:
• Update wt to wt+1 in order to reduce J
• t ← t +1
until we hopefully end up at a minimum.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 44 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization
Gradient descent
Stochastic gradient descent

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 45 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization
Gradient descent
Stochastic gradient descent

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 46 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient Descent

• In each step, takes steps proportional to the negative of the gradient vector of the
function at the current point wt :

wt+1 = wt − η∇J(wt )
• J(w) decreases fastest if one goes from wt in the direction of −∇J(wt )
• Assumption: J(w) is defined and differentiable in a neighborhood of a point wt
• Gradient ascent takes steps proportional to (the positive of ) the gradient to find a
local maximum of the function.
• Continue to find:
w∗ = arg min J(w)
w

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 47 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient Descent (cont.)

• Minimize J(w)

wt+1 = wt − η∇w J(wt )

 ∂J(w) 
 ∂w. 1 
 .. 
∇w J(w) =  
∂J(w)
∂wd

• If η is small enough, then J(wt+1 ) ≤ J(wt ).

• η can be allowed to change at every iteration as η t .

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 48 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent disadvantages

• Local minima problem

• However, when J is convex, all local minima are also global minima ⇒ gradient
descent can converge to the global solution.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 49 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Problem of gradient descent with non-convex cost functions

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 50 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Cost function optimization

• Weight update rule having hw (x) = wT x is as follows:

n ³ ´
wt+1 = wt + η y (i) − wT x(i) x(i)
X
i=1

• η: too small ⇒ gradient descent can be slow.

• η: too large ⇒ gradient descent can overshoot the minimum. It may fail to
converge, or even diverge.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 51 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 52 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 53 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 54 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 55 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 56 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 57 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 58 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 59 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Gradient descent overview (cont.)

Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 60 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization
Gradient descent
Stochastic gradient descent

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 61 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Variants of gradient descent

• Batch gradient descent: processes the entire training set in one iteration
• It can be computationally costly for large data sets and practically impossible for
some applications (such as online learning).
• Mini-batch gradient descent: processes small, random subsets (mini-batches) of
the training set in each iteration
• Balances the efficiency of batch gradient descent.
• Stochastic gradient descent: processes one training example per iteration
• Updates the model parameters more frequently, which can lead to faster convergence.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 62 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Stochastic gradient descent

• Example: Linear regression with SSE cost function

n
J(w) = J (i) (w)
X
i=1
³ ´2
J (i) (w) = y (i) − wT x(i)

wt+1 = wt − η∇w J (i) (w)

³ ´
=⇒ wt+1 = wt + η y (i) − wT x(i) x(i)

In which x(i) indicates the i’th observation arrived.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 63 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Stochastic gradient descent: online learning

• Often, stochastic gradient descent gets close to the minimum much faster than
batch gradient descent.
• Note however that it may never converge to the minimum, and the parameters will
keep oscillating around the minimum of the cost function;
• In practice, most of the values near the minimum will be reasonably good
approximations to the true minimum.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 64 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 65 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Limitations of linear regression

• It is possible that the best fitted line is still far off the real pattern of samples:

Figure 1: No line can be fitted to generalize on these samples

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 66 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Beyond Linear Regression

• How can we extend linear regression to model non-linear relationships?

• Transforming Data Using Basis Functions:
• Basis functions allow us to transform the original features into a new feature space.
• Common basis functions include polynomial and Gaussian functions.
• Learning a Linear Regression on Transformed Features:
• By applying linear regression to the transformed feature vectors, we can model complex,
non-linear relationships.
• This approach maintains the simplicity and interpretability of linear regression while
extending its flexibility.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 67 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Beyond Linear Regression: Polynomial Regression

Figure 2: Best fitted polynomial of degree 4 can generalize well on samples

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 68 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Beyond Linear Regression: change of basis

Figure 3: Changing the basis of [1, x, y] to [1, x2 , y 2 ], we can use linear regression

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 69 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Polynomial regression: Univariate

• Polynomial Regression Hypothesis: m’th order regression

h(x; w) = w0 + w1 x1 + · · · + wm−1 xm−1 + wm xm

• Objective: Fit a polynomial of degree m to data points.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 70 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Polynomial Regression: Univariate (cont.)

• Similar to what we did for univariate linear regression, we can define:

1 m
1 x(1) x(1)
 

··· ŵ0
y (1)
 
1 m
1 x(2) ··· x(2)   ŵ1 
  
′  . 
X = .
 .. .. ..  w= . 
 y =  .. 
 .. . . .   .. 
 
1 m y (n)
1 x(n) ··· x(n) ŵm

In which x(i) indicates the i-th data point.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 71 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Polynomial regression analytical solution: Univariate

Rewriting the SSE cost function using matrix form we have:

J(w) = ∥y − X′ w∥22

Analytical solution: It has closed form solution as

³ ´−1
T T
ŵ = X′ X′ X′ y

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 72 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Training and Validation

• To better distinct between linear regression and polynomial regression, we will

show that linear model can not generalize well.
• Assume the samples are split into two subsets. Train dataset which is used to train
a regression model. As well as a Validation dataset to find the best regression
model for an application.
• If a model can generalize well on validation set, it can be a good candidate.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 73 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Polynomial regression: example

• Consider the following noisy samples

Splitting samples to train and validation sets

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 74 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Quadratic regression vs Linear regression

Fitting both quadratic and linear regression shows the power of polynomial regression
on generalizing for more complex patterns.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 75 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

1 Course Overview

2 Introduction

3 Supervised Learning

4 Optimization

5 Polynomial Regression

6 References

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 76 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

Contributions

• This slide has been prepared thanks to:

• Arshia Gharooni

• Mahan Bayhaghi

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 77 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References

[1] C. M., Pattern Recognition and Machine Learning.

Information Science and Statistics, New York, NY: Springer, 1 ed., Aug. 2006.
[2] M. Soleymani Baghshah, “Machine learning.” Lecture slides.
[3] A. Ng and T. Ma, CS229 Lecture Notes.
[4] T. Mitchell, Machine Learning.
McGraw-Hill series in computer science, New York, NY: McGraw-Hill Professional,
Mar. 1997.
[5] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin, Learning From Data: A Short
Course.
New York, NY: AMLBook, 2012.
[6] S. Goel, H. Bansal, S. Bhatia, R. A. Rossi, V. Vinay, and A. Grover, “CyCLIP: Cyclic
Contrastive Language-Image Pretraining,” ArXiv, vol. abs/2205.14459, May 2022.

CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 78 / 78

Lecture1 PDF
No ratings yet
Lecture1 PDF
37 pages
Tirth.pdf
No ratings yet
Tirth.pdf
19 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Lecture 01 - Introduction To AML-Jan24
No ratings yet
Lecture 01 - Introduction To AML-Jan24
66 pages
MACHINE LEARNING ALGORITHM - Unit-1-1
100% (1)
MACHINE LEARNING ALGORITHM - Unit-1-1
78 pages
Introduction To Machine Learning - 2023
No ratings yet
Introduction To Machine Learning - 2023
44 pages
Introduction To Machine Learning: David Kauchak CS 451 - Fall 2013
No ratings yet
Introduction To Machine Learning: David Kauchak CS 451 - Fall 2013
34 pages
Introduction To ML
No ratings yet
Introduction To ML
4 pages
COS324 Course Notes
No ratings yet
COS324 Course Notes
256 pages
01 Introduction
No ratings yet
01 Introduction
43 pages
ML 01
No ratings yet
ML 01
15 pages
ML Short U1-4
No ratings yet
ML Short U1-4
60 pages
Applied Machine Learning
No ratings yet
Applied Machine Learning
49 pages
ENG6500 1 IntroductionToMLDL Part1
No ratings yet
ENG6500 1 IntroductionToMLDL Part1
63 pages
Introduction
No ratings yet
Introduction
41 pages
01 02 Introduction Regression Analysis and GR
No ratings yet
01 02 Introduction Regression Analysis and GR
11 pages
Introduction To Machine Learning: WWW - Seas.upenn - Edu/ Cis519
100% (1)
Introduction To Machine Learning: WWW - Seas.upenn - Edu/ Cis519
51 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
15 pages
Machine Learning Theory CSE 250C: Introductory Lecture
No ratings yet
Machine Learning Theory CSE 250C: Introductory Lecture
29 pages
1 - ML Introduction1
No ratings yet
1 - ML Introduction1
23 pages
CS3491-AI ML-Chapter 1
No ratings yet
CS3491-AI ML-Chapter 1
19 pages
CS-871-Lecture 1
No ratings yet
CS-871-Lecture 1
41 pages
1 Lecture 1: Introduction To Machine Learning
No ratings yet
1 Lecture 1: Introduction To Machine Learning
12 pages
Machine Learning 10-401, Spring 2018: Introduction, Admin, Course Overview
No ratings yet
Machine Learning 10-401, Spring 2018: Introduction, Admin, Course Overview
35 pages
01 Introduction 1
No ratings yet
01 Introduction 1
71 pages
Unit 1&2
No ratings yet
Unit 1&2
270 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Introduction To Machine Learning
100% (1)
Introduction To Machine Learning
11 pages
ML -1_Sovan_Introduction to ML
No ratings yet
ML -1_Sovan_Introduction to ML
83 pages
asset-v1_MKAU+SEng9032+DEV_01+type@asset+block@ChapOne
No ratings yet
asset-v1_MKAU+SEng9032+DEV_01+type@asset+block@ChapOne
29 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
16 pages
Machine Learning
100% (1)
Machine Learning
46 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Machine Learning Crashcourse
No ratings yet
Machine Learning Crashcourse
233 pages
Supervised and Deep Learning
No ratings yet
Supervised and Deep Learning
83 pages
MLLecture 1
No ratings yet
MLLecture 1
10 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
28 pages
W1_ Introduction to ML
No ratings yet
W1_ Introduction to ML
57 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
46 pages
ml
No ratings yet
ml
333 pages
Introduction of ML
No ratings yet
Introduction of ML
53 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
Iu 3.6.4 ML 101
No ratings yet
Iu 3.6.4 ML 101
39 pages
lec001
No ratings yet
lec001
17 pages
Machine Learning L1
No ratings yet
Machine Learning L1
34 pages
AI17
No ratings yet
AI17
10 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
Machine Learning Copy
No ratings yet
Machine Learning Copy
42 pages
ML Final
No ratings yet
ML Final
95 pages
Andrew NG Complete Machine Learning
No ratings yet
Andrew NG Complete Machine Learning
170 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Unit 1 Aktu
No ratings yet
Unit 1 Aktu
26 pages
ML UNIT I_IT
No ratings yet
ML UNIT I_IT
30 pages
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
No ratings yet
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
106 pages
Digital Transformation in Education: Emerging Markets and Opportunities
From Everand
Digital Transformation in Education: Emerging Markets and Opportunities
PublishDrive
No ratings yet
The Art of Controller Design
From Everand
The Art of Controller Design
Martin Braae
No ratings yet
Quantum Ideas: Please Collect Two Hand Outs: 1. Notes 2. Syllabus
No ratings yet
Quantum Ideas: Please Collect Two Hand Outs: 1. Notes 2. Syllabus
61 pages
Find The Thevenin Equivalent Circuit at Terminals A-B in The Circuit of
No ratings yet
Find The Thevenin Equivalent Circuit at Terminals A-B in The Circuit of
23 pages
Control N Instrumentation Lab Experiments
No ratings yet
Control N Instrumentation Lab Experiments
9 pages
The Residue Theorem and Its Applications
No ratings yet
The Residue Theorem and Its Applications
10 pages
Simple Machine Part 2
No ratings yet
Simple Machine Part 2
4 pages
Inventory Control
No ratings yet
Inventory Control
7 pages
16sccmm13-Apr 2020
No ratings yet
16sccmm13-Apr 2020
7 pages
Triangular PDF
No ratings yet
Triangular PDF
10 pages
Data Structures Algorithms Mock Test
No ratings yet
Data Structures Algorithms Mock Test
6 pages
Review Part 2
No ratings yet
Review Part 2
16 pages
Year 7 Maths - Data - Mean, Mode, - Median - Questions (Ch3 Ex3)
No ratings yet
Year 7 Maths - Data - Mean, Mode, - Median - Questions (Ch3 Ex3)
3 pages
NK C SI R: Rigid Body Dynamics, Home Work Sheet-1
No ratings yet
NK C SI R: Rigid Body Dynamics, Home Work Sheet-1
2 pages
HW 01 - Ise 303
No ratings yet
HW 01 - Ise 303
2 pages
Lecture Material by Discipline "Metrology, Standardization, Certification and Quality Management"
No ratings yet
Lecture Material by Discipline "Metrology, Standardization, Certification and Quality Management"
15 pages
Final Revision g4 part ( 3) 2-1 with answers
No ratings yet
Final Revision g4 part ( 3) 2-1 with answers
10 pages
Design and Analysis of Analog Filters - A Signal Processing PDF
No ratings yet
Design and Analysis of Analog Filters - A Signal Processing PDF
453 pages
Linear Algebra Harvard Notes Lecture 3
No ratings yet
Linear Algebra Harvard Notes Lecture 3
3 pages
Physical Design
No ratings yet
Physical Design
45 pages
Differential Equations For Engineers
No ratings yet
Differential Equations For Engineers
273 pages
Robotic: Differential Motion and Jacobian
33% (3)
Robotic: Differential Motion and Jacobian
37 pages
Morgan Stanley: Varun Bhave: Pre-Intern Experience
No ratings yet
Morgan Stanley: Varun Bhave: Pre-Intern Experience
2 pages
Esti Mystery 409 The Bead Sequence
No ratings yet
Esti Mystery 409 The Bead Sequence
15 pages
Assignment 1: Introduction To Aircraft Structures
No ratings yet
Assignment 1: Introduction To Aircraft Structures
4 pages
Ecopol Apuntes Gomberg
No ratings yet
Ecopol Apuntes Gomberg
48 pages
Wilo Formula's Booklet
100% (1)
Wilo Formula's Booklet
40 pages
Software Training File 2months Infowiz
No ratings yet
Software Training File 2months Infowiz
72 pages
A Gradient-Regularized Coupled Damage-Plasticity Microplane Model For Concrete - Like Materials
No ratings yet
A Gradient-Regularized Coupled Damage-Plasticity Microplane Model For Concrete - Like Materials
6 pages
circles activity!!!
No ratings yet
circles activity!!!
2 pages
Vulcan CFD Code Overview
No ratings yet
Vulcan CFD Code Overview
28 pages
PROGRAM1
No ratings yet
PROGRAM1
119 pages