Chapter04_Training_Models

The document discusses linear regression, including data generation, model fitting, and optimization techniques such as the Normal Equation and Gradient Descent. It highlights the computational complexities associated with these methods and introduces variations like Stochastic and Mini-batch Gradient Descent. Additionally, it provides Python code examples for implementing these techniques using libraries like NumPy and Scikit-Learn.

Uploaded by

Latifah Aldossary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Chapter04_Training_Models

Uploaded by

Latifah Aldossary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Training

Models
DR. ANWAR M. MIRZA
Linear Regression Data

import numpy as np
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

import matplotlib.pyplot as plt

6 7 9 2.8 ?

2 5 1 1.5 ?

8 4 0 6.9 ?

Finding the best fit linear line (or hyperplane) for the training data that minimizes
the error. In essence, find the best values for θ1, θ2, ..., θn in the above equation.
Linear Regression Equation
(Vectorized form)
¿ h𝜃 ( x)
Linear Regression Equation
(Vectorized form)
Objective is to determine the best value of θ that minimizes MSE.
How to Solve this Optimization
Problem?
Two main approaches:
 Normal Equation
 Gradient Descent
Solution Using Normal Equation
Solution Using Normal Equation
Cost function is minimum when,

∇ θ^ 𝑀𝑆𝐸 ( θ^ ) = 0

⇒ X θ^ = y
^
T T
X Xθ=X y
^θ = ( X X ) XT y
T −1
Solution Using Normal Equation
Linear Regression Data

import numpy as np
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

plt.plot(X, y, "b.")
plt.xlabel("$x_1$", fontsize=18)
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.axis([0, 2, 0, 15])
save_fig("generated_data_plot")
plt.show()
Linear Regression Model Fitting
X_b = np.c_[np.ones((100, 1)), X] # add x0 = 1 to each instance
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

theta_best

array([[4.21509616], [2.77011339]])
X_new = np.array([[0], [2]])
# add x0 = 1 to each instance
X_new_b = np.c_[np.ones((2, 1)), X_new]
y_predict = X_new_b.dot(theta_best)
y_predict

array([[4.21509616], [9.75532293]])

import matplotlib.pyplot as plt

plt.plot(X_new, y_predict, "r-")

plt.plot(X, y, "b.")
plt.axis([0, 2, 0, 15])
plt.show()
Computational Complexity of
Normal Equation (1)
The Normal Equation computes the inverse of X⊺ X, which is an (n +
1) × (n + 1) matrix (where n is the number of features). The
computational complexity of inverting such a matrix is typically
about O(n2.4) to O(n3), depending on the implementation. In other
words, if you double the number of features, you multiply the
computation time by roughly 22.4 = 5.3 to 23 = 8.
The SVD (Singular Value Decomposition) approach used by Scikit-
Learn’s LinearRegression class is about O(n2). If you double the
number of features, you multiply the computation time by roughly 4.
Computational Complexity of
Normal Equation (2)
Both the Normal Equation and the SVD approach get very slow
when the number of features grows large (e.g., 100,000).
On the positive side, both are linear with regard to the number of
instances in the training set (they are O(m)), so they handle large
training sets efficiently, provided they can fit in memory.
Gradient Descent
Gradient Descent is a generic optimization algorithm capable of
finding optimal solutions to a wide range of problems. The general
idea of Gradient Descent is to tweak parameters iteratively in order
to minimize a cost function.
Gradient Descent
Gradient Descent – Too small
learning rate
Gradient Descent – Too large
learning rate
Gradient Descent Pitfalls
Linear Regression – Gradient
Descent
Fortunately, the MSE cost function for a Linear Regression model
happens to be a convex function, which means that if you pick any
two points on the curve, the line segment joining them never crosses
the curve.
This implies that there are no local minima, just one global
minimum. It is also a continuous function with a slope that never
changes abruptly.
These two facts have a great consequence: Gradient Descent is
guaranteed to approach arbitrarily close the global minimum (if you
wait long enough and if the learning rate is not too high).
Linear Regression – Gradient
Descent
When using Gradient Descent, you should ensure that all features have a similar
scale (e.g., using Scikit-Learn’s StandardScaler class), or else it will take much
longer to converge.
Batch Gradient Descent
To implement Gradient Descent, you need to compute the gradient of the cost function with
regard to each model parameter θj.
In other words, you need to calculate how much the cost function will change if you change θj
just a little bit. This is called a partial derivative.
Batch Gradient Descent (2)
Instead of computing these partial derivatives individually, you can use Equation
4-6 to compute them all in one go.
Gradient Descent Step
Once you have the gradient vector, which points uphill, just go in the opposite
direction to go downhill. This means subtracting ∇θMSE(θ) from θ. This is where
the learning rate η comes into play: multiply the gradient vector by η to
determine the size of the downhill step
Gradient Descent in Python
eta = 0.1 # learning rate
n_iterations = 1000
m = 100
theta = np.random.randn(2,1) # random initialization
for iteration in range(n_iterations):
gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)
theta = theta - eta * gradients
Gradient Descent – Learning
Rates
Gradient Descent – Iterations and
Tolerance
You may wonder how to set the number of iterations. If it is too low,
you will still be far away from the optimal solution when the
algorithm stops; but if it is too high, you will waste time while the
model parameters do not change anymore.
A simple solution is to set a very large number of iterations but to
interrupt the algorithm when the gradient vector becomes tiny—that
is, when its norm becomes smaller than a tiny number ϵ (called the
tolerance)—because this happens when Gradient Descent has
(almost) reached the minimum.
Stochastic Gradient Descent
The main problem with Batch Gradient Descent is the fact that it uses the whole
training set to compute the gradients at every step, which makes it very slow
when the training set is large.

Stochastic Gradient Descent picks a random instance in the training set at every
step and computes the gradients based only on that single instance.
Stochastic Gradient Descent (2)
Stochastic Gradient Descent (3)
One solution to this dilemma is to gradually reduce the learning rate.

The function that determines the learning rate at each iteration is called the
learning schedule.
Linear Regression Using
SGDRegressor
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, eta0=0.1)
sgd_reg.fit(X, y.ravel())
Mini-batch Gradient Descent
Instead of computing the gradients based on the full training set (as in Batch
GD) or based on just one instance (as in Stochastic GD), Mini-batch GD
computes the gradients on small random sets of instances called mini-batches.
The main advantage of Mini-batch GD over Stochastic GD is that you can get a
performance boost from hardware optimization of matrix operations, especially
when using GPUs.
The algorithm’s progress in parameter space is less erratic than with Stochastic
GD, especially with fairly large mini-batches.
Gradient Descent Comparison
Linear Regression Using Scikit

Applied Finite Element Analysis - Larry J. Segerlind
75% (4)
Applied Finite Element Analysis - Larry J. Segerlind
222 pages
Lecture10 PDF
No ratings yet
Lecture10 PDF
4 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
lec6_7_Linear_regression
No ratings yet
lec6_7_Linear_regression
38 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Regression
No ratings yet
Regression
25 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
CH 4
No ratings yet
CH 4
41 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Lecture04. Training Models (Regression in Chapter 4)
No ratings yet
Lecture04. Training Models (Regression in Chapter 4)
44 pages
Module 3
No ratings yet
Module 3
27 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
ANN-Regression-Python Examples
No ratings yet
ANN-Regression-Python Examples
35 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
Document 2
No ratings yet
Document 2
30 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
Updating_Weight
No ratings yet
Updating_Weight
9 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Lecture 10 Linear_Regression
No ratings yet
Lecture 10 Linear_Regression
59 pages
3.Linear Regression
No ratings yet
3.Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
95 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
Lecture Slides - Linear Reg
No ratings yet
Lecture Slides - Linear Reg
34 pages
Gradient Descent and SGD
No ratings yet
Gradient Descent and SGD
8 pages
Linear Regression With Gradient Descent
100% (1)
Linear Regression With Gradient Descent
8 pages
An Introduction To Gradient Descent and Linear Regression
No ratings yet
An Introduction To Gradient Descent and Linear Regression
8 pages
Gradient Descent Tutorial
No ratings yet
Gradient Descent Tutorial
3 pages
Lecture slides - Linear Regression (2025)
No ratings yet
Lecture slides - Linear Regression (2025)
45 pages
Regression
No ratings yet
Regression
30 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229
No ratings yet
CS229
69 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
Stochastic Gradient Descent Algorithm
No ratings yet
Stochastic Gradient Descent Algorithm
6 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Module2-Optimizations
No ratings yet
Module2-Optimizations
65 pages
CM20315 06 Fitting
No ratings yet
CM20315 06 Fitting
67 pages
UNIT2
No ratings yet
UNIT2
25 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
DPP - Vi Polynomials: A B A B A B
No ratings yet
DPP - Vi Polynomials: A B A B A B
3 pages
HW5 Sol
No ratings yet
HW5 Sol
5 pages
MTech_Maths
No ratings yet
MTech_Maths
7 pages
Maths Quick Revision Note
No ratings yet
Maths Quick Revision Note
12 pages
Fixed Iteration Method
100% (1)
Fixed Iteration Method
19 pages
Prilims: Syllabus of Paper I (General Studies - I)
No ratings yet
Prilims: Syllabus of Paper I (General Studies - I)
3 pages
Optimization and Artificial Intelligence Applications in Power System
No ratings yet
Optimization and Artificial Intelligence Applications in Power System
56 pages
DAA Module 2
No ratings yet
DAA Module 2
18 pages
Euler Method Notes
No ratings yet
Euler Method Notes
5 pages
AEM Question Bank
No ratings yet
AEM Question Bank
6 pages
From The Numerical Solution To The Symbolic Form.
No ratings yet
From The Numerical Solution To The Symbolic Form.
10 pages
OBE Syllabus Numerical Methods and Optization Techniques
No ratings yet
OBE Syllabus Numerical Methods and Optization Techniques
6 pages
Newton's Forward Difference Formula1
No ratings yet
Newton's Forward Difference Formula1
5 pages
Numerical Methods Lab Report
No ratings yet
Numerical Methods Lab Report
12 pages
Quadratic Equations #BB2.0
100% (1)
Quadratic Equations #BB2.0
117 pages
Problem Sheet 1
No ratings yet
Problem Sheet 1
1 page
Descartes' Rule of Signs, Rational Zero Test
No ratings yet
Descartes' Rule of Signs, Rational Zero Test
18 pages
Book 8 Math
No ratings yet
Book 8 Math
5 pages
1983 - High Resolution Schemes For Hyperbolic Conservation Laws - Harten
No ratings yet
1983 - High Resolution Schemes For Hyperbolic Conservation Laws - Harten
37 pages
Properties of Matrices
No ratings yet
Properties of Matrices
6 pages
B Tech (EE) Course Effective From AY 2022-2023
No ratings yet
B Tech (EE) Course Effective From AY 2022-2023
147 pages
Polynomials: X 3 7 y 8 Xy X 3 7 y 8 Xy
No ratings yet
Polynomials: X 3 7 y 8 Xy X 3 7 y 8 Xy
32 pages
Mib Tutorial10
No ratings yet
Mib Tutorial10
1 page
Topic 14. Runge-Kutta Methods
No ratings yet
Topic 14. Runge-Kutta Methods
6 pages
Study Material 10TH Maths, 2023-24 - Polynomials
No ratings yet
Study Material 10TH Maths, 2023-24 - Polynomials
4 pages
College Algebra: Fifth Edition
No ratings yet
College Algebra: Fifth Edition
78 pages
B.sc. Maths III & IV Sem 2018
No ratings yet
B.sc. Maths III & IV Sem 2018
16 pages
General Mathematics Lesson 3 Rational Function
No ratings yet
General Mathematics Lesson 3 Rational Function
64 pages

Chapter04_Training_Models

Uploaded by

Chapter04_Training_Models

Uploaded by

Training

import matplotlib.pyplot as plt

import matplotlib.pyplot as plt

plt.plot(X_new, y_predict, "r-")

You might also like