01-Linear Regression
01-Linear Regression
Ali Sharifi-Zarchi
CE Department
Sharif University of Technology
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 1 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
4 Optimization
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 2 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
4 Optimization
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 3 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Course Overview
• Supervised Learning
• Unsupervised Learning
• Neural Networks
• Computer Vision
• Natural Language Processing
• Contrastive Language-Image Pretraining
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 4 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Supervised Learning
Unsupervised Learning
Neural Networks
Computer Vision
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 9 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
4 Optimization
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 11 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
• Machine Learning: A field of study that enables computers to learn from data
without being explicitly programmed.
• Involves constructing algorithms that generalize patterns from data.
• Focuses on predicting outcomes, classification, or uncovering hidden structures.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 12 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 13 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
• Goal: Develop models that make accurate predictions based on past data.
• Formal definition: Given a task T , performance measure P, and experience E:
Learning Problem = (T , P, E)
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 14 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Example Usage of ML
• Real-world examples:
• Fake News Detection
(classification problem)
• Predicting House Prices
(regression problem)
• Self-driving cars (real-time
decision making)
• Application domains: finance,
healthcare, robotics, etc.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 15 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 16 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figure adapted from slides of Dr. Soleymani, Machine Learning course, Sharif University of technology.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 17 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Paradigms of ML
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 18 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
Linear regression
Analytical solution
4 Optimization
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 19 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Supervised Learning
• Definition: A form of machine learning where the model learns from labeled data
{(xi , yi )} to predict an output y given an input x.
• Goal: Estimate a function f : RD → R, such that y = f (x) + ϵ, where ϵ is noise.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 20 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 21 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figure adapted from Abu-Mostafa, Yaser S., Malik Magdon-Ismail, and Hsuan-Tien Lin. Learning from Data: A Short Course
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 22 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 23 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
Linear regression
Analytical solution
4 Optimization
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 24 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 25 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
hw (x) = w0 + w1 x1 + · · · + wD xD = w⊤ x
• Input Vector x:
£ ¤
x = x0 = 1, x1 , x2 , . . . , xD
• Parameter Vector w:
£ ¤
w = w0 , w1 , w2 , . . . , wD
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 26 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
D
X
hw (x) = w0 + wi xi
i=1
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 27 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 28 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
• A cost function measures how well the hypothesis h(x; w) fits the training data.
• In regression problems, the most common error function is the Squared Error (SE):
³ ´2
SE : y (i) − h(x(i) ; w)
• Cost function should measure all predictions. Thus a choice could be Sum of
Squared Errors (SSE):
N ³ ´2
y (i) − h(x(i) ; w)
X
J(w) =
i=1
• Objective: Minimize the cost function to find the best parameters w.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 29 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
N ³ ´2
y (i) − wT x(i)
X
SSE =
i=1
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 30 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
n ³ ´2
y (i) − hw (x(i) )
X
J(w) =
i=1
n ³ ´2
y (i) − w0 − w1 x(i)
X
=
i=1
Figure adapted from slides of Dr. Soleymani, Machine Learning course, Sharif university of technology.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 31 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
Linear regression
Analytical solution
4 Optimization
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 32 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 33 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
n ³ ´2
y (i) − w0 − w1 x(i)
X
J(w) =
i=1
∂J(w)
=0
∂w0
∂J(w)
=0
∂w1
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 34 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
n ³ ´2
y (i) − w0 − w1 x(i)
X
J(w) =
i=1
∂J(w) Xn ³ ´
= 2 y (i) − w0 − w1 x(i) (−x(i) ) = 0
∂w1 i=1
∂J(w) Xn ³ ´
= 2 y (i) − w0 − w1 x(i) (−1) = 0
∂w0 i=1
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 35 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
This is a real-world example of how businesses can use linear regression to make
decisions about marketing budgets. The following table shows the amount of TV
advertising budget spend and respective average sales of houses in Boston:
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 36 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
• In this problem, Sales per TV advertising is need thus the amount spend is
considered as input x and sales is considered as output y.
• Using linear regression, cost function can be written as:
5 ³ ´2
y (i) − w0 − w1 x(i)
X
J(w) =
i=1
∂J(w) X5 ³ ´
= 2 y (i) − w0 − w1 x(i) (−x(i) ) = 0 =⇒ 110863w1 + 624.1w0 − 10843.04 = 0
∂w1 i=1
∂J(w) X5 ³ ´
= 2 y (i) − w0 − w1 x(i) (−1) = 0 =⇒ 624.1w1 + 5w0 − 73.2 = 0
∂w0 i=1
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 37 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
w1 ≈ 0.052
w0 ≈ 8.18
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 38 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
n ³ ´2 n ³ ´2
y (i) − hw (x(i) ) y (i) − wT x(i)
X X
J(w) = =
i=1 i=1
1 x1(1) xd(1)
··· w0
y (1)
(2)
1 x1 ··· xd(2) w1
.
X = .
.. .. .. w=
..
y = ..
.. . . . .
(n) y (n)
1 x1 ··· xd(n) wd
(i)
In which xm indicates m’th feature of data point i
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 39 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
J(w) = ∥y − Xw∥22
• Explicitly taking the cost function derivative with respect to the w, and setting them
to zero:
∇w J(w) = −2XT y − Xw
¡ ¢
¢−1
∇w J(w) = 0 =⇒ XT Xw = XT y =⇒ w = XT X XT y
¡
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 40 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 41 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
• Scalability: Analytical solutions do not scale well with very large datasets, making
them impractical for big data applications.
• Finding the inverse of a matrix:
• Simplest way of finding inverse: Gaussian elimination, having complexity of O(n3 ).
• Other methods: LU decomposition, which also has a complexity of O(n3 ) but is more
stable.
• Numerical methods: Iterative methods like Conjugate Gradient, which can be more
efficient for large, sparse matrices.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 42 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 43 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
• Another approach,
• Start from an initial guess and iteratively change w to minimize J(w).
• The gradient descent algorithm
• Steps:
• Start from w0
• Repeat:
• Update wt to wt+1 in order to reduce J
• t ← t +1
until we hopefully end up at a minimum.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 44 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
4 Optimization
Gradient descent
Stochastic gradient descent
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 45 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
4 Optimization
Gradient descent
Stochastic gradient descent
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 46 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Gradient Descent
• In each step, takes steps proportional to the negative of the gradient vector of the
function at the current point wt :
wt+1 = wt − η∇J(wt )
• J(w) decreases fastest if one goes from wt in the direction of −∇J(wt )
• Assumption: J(w) is defined and differentiable in a neighborhood of a point wt
• Gradient ascent takes steps proportional to (the positive of ) the gradient to find a
local maximum of the function.
• Continue to find:
w∗ = arg min J(w)
w
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 47 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
• Minimize J(w)
∂J(w)
∂w. 1
..
∇w J(w) =
∂J(w)
∂wd
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 48 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 49 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 50 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
n ³ ´
wt+1 = wt + η y (i) − wT x(i) x(i)
X
i=1
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 51 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 52 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 53 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 54 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 55 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 56 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 57 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 58 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 59 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figures adapted from slides of Andrew Ng, Machine Learning course, Stanford.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 60 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
4 Optimization
Gradient descent
Stochastic gradient descent
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 61 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
• Batch gradient descent: processes the entire training set in one iteration
• It can be computationally costly for large data sets and practically impossible for
some applications (such as online learning).
• Mini-batch gradient descent: processes small, random subsets (mini-batches) of
the training set in each iteration
• Balances the efficiency of batch gradient descent.
• Stochastic gradient descent: processes one training example per iteration
• Updates the model parameters more frequently, which can lead to faster convergence.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 62 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
³ ´
=⇒ wt+1 = wt + η y (i) − wT x(i) x(i)
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 63 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
• Often, stochastic gradient descent gets close to the minimum much faster than
batch gradient descent.
• Note however that it may never converge to the minimum, and the parameters will
keep oscillating around the minimum of the cost function;
• In practice, most of the values near the minimum will be reasonably good
approximations to the true minimum.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 64 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
4 Optimization
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 65 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
• It is possible that the best fitted line is still far off the real pattern of samples:
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 67 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Figure 3: Changing the basis of [1, x, y] to [1, x2 , y 2 ], we can use linear regression
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 69 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 70 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 71 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
J(w) = ∥y − X′ w∥22
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 72 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 73 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 74 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Fitting both quadratic and linear regression shows the power of polynomial regression
on generalizing for more complex patterns.
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 75 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
1 Course Overview
2 Introduction
3 Supervised Learning
4 Optimization
5 Polynomial Regression
6 References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 76 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
Contributions
• Mahan Bayhaghi
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 77 / 78
Course Overview Introduction Supervised Learning Optimization Polynomial Regression References
CE Department (Sharif University of Technology) Machine Learning (CE 40477) September 21, 2024 78 / 78