05 Gradient Descent

Uploaded by

elrasifasmaa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

05 Gradient Descent

Uploaded by

elrasifasmaa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Machine Learning

Gradient Descent

Mostafa S. Ibrahim
Teaching, Training and Coaching for more than a decade!

Artificial Intelligence & Computer Vision Researcher

PhD from Simon Fraser University - Canada
Bachelor / MSc from Cairo University - Egypt
Ex-(Software Engineer / ICPC World Finalist)

© 2023 All rights reserved.

Please do not reproduce or redistribute this work without permission from the author
Optimization
● An optimization problem is one that involves minimizing or maximizing a
function
● Last time, we found the minimum of f(x) = 3x2 + 4x + 7 using an analytical
solution
○ Find derivative f’(x)
○ Set to zero f’(x) = 0
○ Find x: the minimum for a function with a global minimum
● The previous approach won't work for the more complex functions coming up!
● Let’s find the minimum in an iterative way for the same f(x)
Gradient Descent
● Our goal is to find the x that has the
minimum y (analytically x = -⅔)
● We'd like to find the answer, but in an
iterative manner!
● Start from an initial location, e.g. x = 0
● Strategy: keep moving x with Δx = 0.01 in
the direction that makes it closer to the
optimal point!
● Should we go left or right? -0.01 vs 0.01

● f(x) = 3x2 + 4x + 7
● f'(x) = 6x + 4
Positive slope case
● Intuitively, we would like to move towards
the left side (negative Δx)
● What is the slope sign at x = 0?
○ f’x(0) = 4: positive slope
● This means we need to move in the
opposite direction of the slope!
● Then, keep moving towards the left until
reaching a point with zero slope
○ The minimum!

● f(x) = 3x2 + 4x + 7
● f'(x) = 6x + 4
Negative slope case
● What if we started from x = -1.3?
● Intuitively, we would like to move towards
the right side (positive Δx)
● What is the slope sign at x = -1.3?
○ f’x(0) = -3.8: negative slope
● This means we need to move in the
opposite direction of the slope!
● Then, keep moving towards the right until
reaching a point with zero slope
○ The minimum!

● f(x) = 3x2 + 4x + 7
● f'(x) = 6x + 4
Gradient Descent
● An iterative algorithm to find the local minimum
○ Start from an initial location
○ Keep moving in the opposite direction of the gradient
● So far we used fixed Δx = 0.01
○ But this constant can cause issues based
on the steepness of the curve
○ It can be very slow. It can make big moves close to the minimum!
● How can we make it dynamic? Use the gradient value itself!
● However, this value itself has the potential to be very big
○ Let’s multiply by a small value
○ Let’s call it the learning rate (lr). It is a hyperparameter
■ A hyperparameter is a parameter whose value is used to control the learning process
Parameter and Hyperparameter
● Our model for a simple line has 2 parameters: m and c
○ We would like to learn these 2 lines
● We needed an extra parameter, the learning rate (and precision)
○ We call these hyperparameters
○ We typically don’t learn them
○ But we experimentally try different values to find suitable ones
Stopping Criteria
● One simple criteria is the number of iterations (e.g. 100 iterations)
○ But what if we need more?
○ However, it's still good to force an end!
● Another way is the precision
○ At each iteration we have the old x and the new x
○ Once the 2 values are almost the same, stop the program
● The best is to use both
Effect of large step size (learning rate)
● We may suffer from an oscillating behaviour around the minimum value
○ This means that it keeps missing the minimum, instead jumping to positive and negative
directions on either side of the minimum
Code Tour
Question!
● Do we need to decrease the step size (LR) over time for the algorithm to
converge?

No, the gradient descent itself will take smaller

steps when getting closer to the minima
Global vs Local
● When the function has a single u-shape, it will have a single global minimum
○ We call them convex functions
● However, if it has multiple u-shapes, it will have several minimums
○ We call them local minima, except the best the most minimum among all of them
● Consider the function
x⁴ - 6x² - x -1
● There are 2 local minimum ys
○ X = -1.68, Y = -8.3
○ X = 1.17, Y = -11.7 (global)
● Where do we end if we start
from: -2.4, -0.15, 0.1, 2.39
● Let’s run it
Graph of function of multivariate variables
● What about functions that have 2
variables? Or multivariate?!
● Consider for example:
f(x, y) = 3(x + 2)² + (y - 1)²
○ Now, both x and y should be updated
○ Analogy: To head to the south-west, people
don't usually move directly from east-west,
and then north-south. Typically, the shortest
path (i.e. moving along both coordinates
simultaneously) is preferred!
○ This function has a global minimum
● we can’t move both x and y at the same
time when looking at a tangent
Graph of function of multivariate variables
● A typical solution for multivariate problems that you can’t solve simultaneously
is to try solving per variable and combining the results
○ Intuition: can we reduce it to 1D movements per variable?
● Therefore, we apply the partial derivative for each variable
○ Which mainly fixes on one variable (a 1D slice from this variable's perspective)
● The overall algorithm will just be extended from 1D to 2D
○ Compute a partial derivative for every variable
○ Make an update to all of them together
Slicing a graph on output dimension (z)

● If you slice the graph with a plane that is parallel to the xy-plane,
then all z-values will be equal
○ For example, if z represents some cost function, then all
such (x, y) solutions will have the same cost
○ This is how we create contour plots

Img src
2x² - 4xy + y⁴ + 2

● This function has more than local minima

● Coding common mistake
○ Computing the partial derivative of each
variable using the old state
○ Don’t use the changed state
Terminology
● In programming, a parameter is a variable in a method definition.
○ The arguments are the values you pass when calling the function
○ Parameter = Variable. Argument = Value
● In ML/neural network, a parameter is a weight value
○ Parameter = Value (connection weight)
● A model is set of parameters (weights) that we save for later inference
○ (m, c) for the line equation
● A hyperparameter is a parameter whose value is used to control the learning
process
○ Learning rate is one of the most critical hyperparameters to handle
■ Specially its initial value. Secondly, how to decay
○ We will comeback to it again later
Gradient Descent
● Gradient descent is a first-order iterative optimization algorithm for finding
a local minimum of a differentiable function
● Iterative Optimization
● Start from an initial solution
● Keep iterating to improve the last solution (tail recursion)
● First-order
○ Only first-order derivatives make use of this method
■ In other words, we compute derivatives only once
● Another alternative to Gradient Descent is Newton's method
○ Useful, but there are several impractical challenges
Lectures flow so far
● We wanted to do House Price Prediction (Supervised / Regression)
● The data seems coming from a line
○ Let’s find the best line. This treatment is called linear regression
● Given data and line, we defined a cost (error) function
● We want to minimize the function (find parameters that gives min error)
● Gradient descent: general technique that given a function F, iteratively it finds
its (local) minima
● Next: Let’s get back to linear regression + gradient descent
“Acquire knowledge and impart it to the people.”

“Seek knowledge from the Cradle to the Grave.”

Limits of Trig Functions
100% (1)
Limits of Trig Functions
21 pages
DA4139 Level II CFA Mock Exam 1 Afternoon
100% (1)
DA4139 Level II CFA Mock Exam 1 Afternoon
34 pages
Time Series Analysis & Forecast
100% (1)
Time Series Analysis & Forecast
36 pages
Basics of Credit Risk Modelling
100% (1)
Basics of Credit Risk Modelling
13 pages
Sommer and Kulkarni 2012 Presentation
No ratings yet
Sommer and Kulkarni 2012 Presentation
38 pages
04 Background - Calculs
No ratings yet
04 Background - Calculs
24 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
Lecture Notes - Google Dokument
No ratings yet
Lecture Notes - Google Dokument
43 pages
Divide and Conquer Algorithm
No ratings yet
Divide and Conquer Algorithm
58 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
Optimization and Search
No ratings yet
Optimization and Search
27 pages
Mathematical and Programming Ability - Revision Notes (5) (1)
No ratings yet
Mathematical and Programming Ability - Revision Notes (5) (1)
12 pages
MECH 262 - Notes (Statistics)
No ratings yet
MECH 262 - Notes (Statistics)
7 pages
MHF4U Unit 1
No ratings yet
MHF4U Unit 1
9 pages
G5 Numerics Part II
No ratings yet
G5 Numerics Part II
81 pages
Week5 BAM
No ratings yet
Week5 BAM
48 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Process Optimization
No ratings yet
Process Optimization
70 pages
Optim
No ratings yet
Optim
70 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Graphing Functions
No ratings yet
Graphing Functions
20 pages
02-Linear Regression
No ratings yet
02-Linear Regression
17 pages
Week 5 AI
No ratings yet
Week 5 AI
8 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
DSAL-210-Lecture 3 - Understanding Asymptotic Notation
No ratings yet
DSAL-210-Lecture 3 - Understanding Asymptotic Notation
22 pages
Topic 07 - Data Modelling - Part I
No ratings yet
Topic 07 - Data Modelling - Part I
40 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Machine Learning For Data Science 2 - Normalizing Flows V2
No ratings yet
Machine Learning For Data Science 2 - Normalizing Flows V2
50 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Gen Math Quarter 2 Notes
No ratings yet
Gen Math Quarter 2 Notes
19 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
6062affd0fe7044b4fe2f918 SS AP Calc BC
No ratings yet
6062affd0fe7044b4fe2f918 SS AP Calc BC
61 pages
Numerics Lecture - G5 - No Code
No ratings yet
Numerics Lecture - G5 - No Code
35 pages
AI Lec06 LocalSearch
No ratings yet
AI Lec06 LocalSearch
21 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
Class 9 after
No ratings yet
Class 9 after
38 pages
Real analysis concept review
No ratings yet
Real analysis concept review
3 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
Artificial Intelligence 7: Local Search
No ratings yet
Artificial Intelligence 7: Local Search
28 pages
Calc BC unit 1-3
No ratings yet
Calc BC unit 1-3
2 pages
AP Stat Review
No ratings yet
AP Stat Review
23 pages
Mathematics in The Modern World - Finals - Reviewer
No ratings yet
Mathematics in The Modern World - Finals - Reviewer
19 pages
Week 1 - Intro To Limits
No ratings yet
Week 1 - Intro To Limits
35 pages
Lecture 16
No ratings yet
Lecture 16
21 pages
Unit 4 - 2
No ratings yet
Unit 4 - 2
21 pages
MCS-211 Design and Analysis of Algorithms
No ratings yet
MCS-211 Design and Analysis of Algorithms
38 pages
Lecture 3 Number Theory (1)
No ratings yet
Lecture 3 Number Theory (1)
51 pages
Topics Limit Theorem One Sided Limits Infinite Lim - 230920 - 110301
No ratings yet
Topics Limit Theorem One Sided Limits Infinite Lim - 230920 - 110301
25 pages
DAA_Algebraic Problems
No ratings yet
DAA_Algebraic Problems
44 pages
Lecture 3 Number Theory
No ratings yet
Lecture 3 Number Theory
49 pages
QAB SSN 1 Updated For Students
No ratings yet
QAB SSN 1 Updated For Students
31 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Session 1 Tuning Curve
No ratings yet
Session 1 Tuning Curve
14 pages
A-Level Maths 9709 Cheat Sheet P1.Docx
No ratings yet
A-Level Maths 9709 Cheat Sheet P1.Docx
13 pages
Precalculus Study Guide
No ratings yet
Precalculus Study Guide
31 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Problem Solving (Computing) : CS101 Introduction To Computing
No ratings yet
Problem Solving (Computing) : CS101 Introduction To Computing
35 pages
The Hill Climbing algorithm is a local search technique that continuously moves towards higher values or.docx
No ratings yet
The Hill Climbing algorithm is a local search technique that continuously moves towards higher values or.docx
8 pages
Unit No. 2
No ratings yet
Unit No. 2
30 pages
LAB REPORT 2
No ratings yet
LAB REPORT 2
2 pages
Slide 2
No ratings yet
Slide 2
30 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Homework Helpers: Trigonometry
From Everand
Homework Helpers: Trigonometry
Denise Szecsei
1/5 (1)
cyber_attacks
No ratings yet
cyber_attacks
14 pages
Bochum_aa_mot
No ratings yet
Bochum_aa_mot
2 pages
01+Why+Programming+Python
No ratings yet
01+Why+Programming+Python
5 pages
03 Linear Regression Intuition
No ratings yet
03 Linear Regression Intuition
23 pages
Task
No ratings yet
Task
1 page
Investigation of Linearity, Detection Limit (LD) and Quantitation Limit (LQ) of Active Substance From Pharmaceutical Tablets
No ratings yet
Investigation of Linearity, Detection Limit (LD) and Quantitation Limit (LQ) of Active Substance From Pharmaceutical Tablets
4 pages
Guidelines Meeting - Minutes - NEP (2023-24) - B. COM. (PROG) SEM III DSC3.1 - Business Statistics
No ratings yet
Guidelines Meeting - Minutes - NEP (2023-24) - B. COM. (PROG) SEM III DSC3.1 - Business Statistics
5 pages
Sts Reviewer
No ratings yet
Sts Reviewer
5 pages
Quantitative Evaluation of Cerebellar Ataxia Through Automated Assessment of Upper Limb Movements
No ratings yet
Quantitative Evaluation of Cerebellar Ataxia Through Automated Assessment of Upper Limb Movements
11 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
TP051159 AML Indi
No ratings yet
TP051159 AML Indi
25 pages
Job Recommendation System Using Ensemble Filtering Method
No ratings yet
Job Recommendation System Using Ensemble Filtering Method
5 pages
Salako-2019-Potential For Domestication of Borassus Aethiopum
No ratings yet
Salako-2019-Potential For Domestication of Borassus Aethiopum
16 pages
STAT 252-Notes-Topic 5-Multiple Linear Regression
No ratings yet
STAT 252-Notes-Topic 5-Multiple Linear Regression
33 pages
Faqs Sta301 by Naveedabbas17
100% (1)
Faqs Sta301 by Naveedabbas17
77 pages
1 s2.0 S0959652624033961 Main
No ratings yet
1 s2.0 S0959652624033961 Main
11 pages
Lesson 04
No ratings yet
Lesson 04
5 pages
Analytical Techniques Final OSA
No ratings yet
Analytical Techniques Final OSA
15 pages
Sokal Rohlf 2012 Contents
No ratings yet
Sokal Rohlf 2012 Contents
12 pages
The Role of Team Effectiveness in Construction Project Teams
0% (1)
The Role of Team Effectiveness in Construction Project Teams
208 pages
Classification Full
No ratings yet
Classification Full
50 pages
Multicollinearity
No ratings yet
Multicollinearity
11 pages
Reading 10 Simple Linear Regression
No ratings yet
Reading 10 Simple Linear Regression
5 pages
Assenga Et Al 2018the - Impact - of - Board - Characteristics - On - The - Financial - Performance - of - Tanzanian - Firms
No ratings yet
Assenga Et Al 2018the - Impact - of - Board - Characteristics - On - The - Financial - Performance - of - Tanzanian - Firms
31 pages
COST-BEHAVIOR - Mas 23
No ratings yet
COST-BEHAVIOR - Mas 23
12 pages
Asymmetric Cost Behavior: Rajiv D. Banker Dmitri Byzalov
No ratings yet
Asymmetric Cost Behavior: Rajiv D. Banker Dmitri Byzalov
60 pages
Impact of Employee Turnover On Organisational Effectiveness in Tele Communication Sector of Pakistan
No ratings yet
Impact of Employee Turnover On Organisational Effectiveness in Tele Communication Sector of Pakistan
9 pages
Instant ebooks textbook Regression Analysis and Linear Models Concepts Applications and Implementation 1st Edition Richard B. Darlington Phd download all chapters
100% (1)
Instant ebooks textbook Regression Analysis and Linear Models Concepts Applications and Implementation 1st Edition Richard B. Darlington Phd download all chapters
77 pages
Inference and Intervention Causal Models For Business Analysis
No ratings yet
Inference and Intervention Causal Models For Business Analysis
349 pages
Review of Literature
No ratings yet
Review of Literature
53 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
77 pages

05 Gradient Descent

Uploaded by

05 Gradient Descent

Uploaded by

Machine Learning

Artificial Intelligence & Computer Vision Researcher

© 2023 All rights reserved.

No, the gradient descent itself will take smaller

● This function has more than local minima

“Seek knowledge from the Cradle to the Grave.”

You might also like