06_Gradient_Descent_Intuition_12_min

This video provides an intuitive understanding of the gradient descent algorithm, focusing on the roles of the learning rate (alpha) and the derivative term in updating parameters. It explains how the algorithm adjusts the parameter based on the slope of the cost function and the implications of choosing a small or large learning rate. The video concludes by stating that gradient descent can effectively minimize any cost function, leading to the development of a linear regression algorithm in the next session.

Uploaded by

Nabyendu Adhikari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views5 pages

06_Gradient_Descent_Intuition_12_min

Uploaded by

Nabyendu Adhikari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

In the previous video, we gave a

mathematical definition of gradient

descent. Let's delve deeper, and in this
video, get better intuition about what the
algorithm is doing, and why the steps of
the gradient descent algorithm might make
sense. Here's the gradient descent
algorithm that we saw last time. And, just
to remind you, this parameter, or this
term, alpha, is called the learning rate.
And it controls how big a step we take
when updating my parameter theta J. And
this second term here is the derivative
term. And what I want to do in this video
is give you better intuition about what each of
these two terms is doing and why, when put
together, this entire update makes sense.
In order to convey these intuitions, what
I want to do is use a slightly simpler
example where we want to minimize the
function of just one parameter. So, so we
have a, say we have a cost function J of
just one parameter, theta one, like we
did, you know, a few videos back. Where
theta one is a real number, okay? Just so we can have 1D plots, which
are a little bit simpler to look at. And
let's try to understand why gradient
descent would do on this function.
[sound]. So, let's say here's my function.
J of theta one, and so that's my, and
where theta one is a real number. Right,
now let's say I've initialized gradient
descent with theta one at this location.
So image that we start off at that point
on my function. What gradient descent will
do, is it will update. Theta one gets
updated as theta one minus Alpha times DD
theta one J of theta one, right? and oh an
just as an aside you know this, this
derivative term, right? If you're
wondering why I changed the notation from
these partial derivative symbols. If you
don't know what the difference is between
these partial derivative symbols and the
dd theta don't worry about it. Technically
in mathematics we call this a partial
derivative, we call this a derivative,
depending on the number of, of parameters
in the function J, but that's a
mathematical technicality, so, you know
For the purpose of this lecture, think of
these partial symbols, and DD theta one as
exactly the same thing. And, don't worry
about whether there are any differences.
I'm gonna try to use the mathematically
precise notation. But for our purposes,
these notations are really the same thing.
So, let's see what this, this equation
will do. And so we're going to compute
this derivative of, I'm not sure if you've
seen derivatives in calculus before. But
what a derivative, at this point, does, is
basically saying, you know, let's. Take
the tangent to that point, like that
straight line, the red line, just,
just touching this function and
let's look at the slope of this red line. That's
where the derivative is. It says
what's the slope of the line that is just
tangent to the function, okay, and the
slope of the line is of course is just
right, you know just the height divided by
this horizontal thing. Now. This line has
a positive slope, so it has a positive
derivative. And so, my update to theta is
going to be, theta one gives the update that
theta one minus alpha times some positive
number. Okay? Alpha, the learning
rate is always a positive number. And so
I'm gonna to take theta one, this update
as theta one minus something. So I'm gonna
end up moving theta one to the left. I'm
gonna decrease theta one and we can see
this is the right thing to do because I
actually went ahead in this direction you
know to get me closer to the minimum over
there. So, gradient descent so far seems
to be doing the right thing. Let's look at
another example. So let's take my same
function j. Just trying to draw the same
function j of theta one. And now let's say
I had instead initialized my parameter
over there on the left. So theta one is
here. I'm gonna add that point on the
surface. Now, my derivative term, d, d
theta one j of theta one, when evaluated
at this point, gonna look at right. The
slope of that line. So this derivative
term is a slope of this line. But this
line is slanting down, so this line has
negative slope. Right? Or alternatively I
say that this function has negative
derivative, just means negative slope at
that point. So this is less than equal to
zero. So when I update theta, then if
theta is updated as theta minus alpha
times a negative number. And so I have
theta one minus a negative number which
means I'm actually going to increase theta,
right? Because this is minus of a negative
number means I'm adding something to theta
and what that means is that I'm going to
end up increasing theta. And so we'll
start here and increase theta, which again
seems like the thing I want to do to try
to get me closer to the minimum. So, this
hopefully explains the intuition behind
what the derivative term is doing. Let's
next take a look at the learning rate term
alpha, and try to figure out what that's
doing. So, here's my gradient descent
update rule. Right, there's this equation
And let's look at what can happen, if
Alpha is either too small, or if Alpha is
too large. So this first example, what
happens if Alpha is too small. So here's
my function J. J of theta. Lets
just start here. If alpha is too small
then what I'm going to do is gonna
multiply the update by some small number.
So end up taking, you know, it's like a baby step
like that. Okay, so that's one step
[inaudible]. Then from this new point
we're gonna take another step. But if
the alpha is too small lets take another
little baby step. And so if And so if my
learning rate is too small. I'm gonna end
up, you know, taking these tiny, tiny baby
steps to try to get to the minimum and I'm
gonna need a lot of steps to get to the
minimum and so. If alpha's too small, can
be slow because it's gonna take these
tiny, tiny baby steps. And it's gonna need
a lot of steps before it gets anyway
close to the global minimum. Now,
how about if the alpha is too large.
So here's my function J of theta.
Turns out if alpha is too large, then
grading descent can overshoot a minimum
and may even fail to converge or even diverge. So here is what I mean. Let's say
[inaudible]
ireful minimum So the derivative council
It's actually close to the minimum. So the derivative points to the right, but if
alpha is too big, I'm gonna
take a huge step. Maybe I'm gonna take a huge step like that. Right? So I end up
taking a huge step.
Now, my cost function has got worse. cause it starts off from this value then now
my value has gotten worse. Now my
derivatives, you know, points to the left, it's actually decrease theta. But look,
if my learning rate is too big,
I may take a huge step going from here all
the way out there so I end up. going all
there. Right? And if my learning rate was too
big I can take another huge step on the
next acceleration and kind of overshoot
and overshoot and so on until you notice
I'm actually getting further and further
away from the minimum. And so if alpha is
too large it can fail to converge or even
diverge. Now, I have another question for
you. So, this is a tricky one. And when I
was first learning this stuff, it actually
took me a long time to figure this out.
What if your pre-emptive theta one is
already at a local minimum? What do you
think one step of gradient descent will
do? So let's suppose you initialize theta
one at a local minimum. So you know
suppose this is your initial value of theta one
over here and it's already at a local
optimum or the local minimum. It sends
out that at local optimum your derivative
would be equal to zero. Since it's that
slope where it's that tangent point so the
slope of this line will be equal to zero
and thus this derivative term. Is equal to
zero. And so, in your gradient descent
update, you have theta one, gives update
that theta one, minus alpha times zero.
And so, what this means is that, if you're
already at a local optimum, it leaves
theta one unchanged 'cause this, you know,
gets the update's theta one equals theta one.
So if your parameter is already at a local
minimum, one step of gradient descent
does absolutely nothing. It doesn't change
the parameter, which is, which is what you
want. Cuz it keeps your solution at the
local optimum. This also explains why
gradient descent can converge the local
minimum, even with the learning rate Alpha
fixed. Here's what I mean by that. Let's
look at an example. So here's a cost
function J with theta. That maybe I want
to minimize and let's say I initialize my
algorithm my gradient descent algorithm, you know,
out there at that magenta point. If I take
one step of gradient descent you know,
maybe it'll take me to that point cuz my
derivative's pretty steep out there, right?
Now I'm at this green point and if I take
another step of gradient descent, you
notice that my derivative, meaning the
slope, is less steep at the green point when
compared to at the magenta point out
there, right? Because as I approach the
minimum my derivative gets closer and
closer to zero as I approach the minimum.
So, after one step of gradient descent,
my new derivative is a little bit smaller.
So I wanna take another step of gradient
descent. I will naturally take a somewhat
smaller step from this green point than I
did from the magenta point. Now I'm at the new
point, the red point, and then now even
closer to global minimums, so the
derivative here will be even smaller than
it was at the green point. So when I take
another step of gradient descent, you know, now
my derivative term is even smaller, and so
the magnitude of the update to theta
one is even smaller, so you can take
small step like so, and as gradient descent
runs. You will automatically take smaller
and smaller steps until eventually you are
taking very small steps, you know, and you
find the converge to the to the local
minimum. So, just to recap. In gradient
descent as we approach the local minimum,
grading descent will automatically take
smaller steps and that's because as we
approach the local minimum, by definition,
local minimum is when you have this
derivative equal to zero. So as we
approach the local minimum this derivative
term will automatically get smaller and
so gradient descent will automatically
take smaller step. So, this is what
gradient descent looks like, and so actually
there is no need to decrease alpha
overtime. So, that's the gradient descent
algorithm, and you can use it to minimize,
to try to minimize any cost function J.
Not the cost function J to be defined for
linear regression. In the next video,
we're going to take the function J, and
set that back to be exactly linear
regression's cost function. The, the
square cost function that we came up with
earlier. And taking gradient descent, and
the square cost function, and putting
them together. That will give us our first
learning algorithm, that'll give us our
linear regression algorithm.

L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
Distributional Regression Rage Against the Mean
No ratings yet
Distributional Regression Rage Against the Mean
25 pages
2 1174 Original Famous People in Computer History 1deb09
No ratings yet
2 1174 Original Famous People in Computer History 1deb09
1 page
NCERT-Books-for-class-2-Maths-Chapter-4
No ratings yet
NCERT-Books-for-class-2-Maths-Chapter-4
6 pages
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
No ratings yet
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
38 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
8 pages
Gradient Descent for Linear Regression: repeat until convergence: (:=:=) − α ( −) 1 ∑ − α ( ( −) ) 1 ∑
No ratings yet
Gradient Descent for Linear Regression: repeat until convergence: (:=:=) − α ( −) 1 ∑ − α ( ( −) ) 1 ∑
1 page
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Shahzad Cv(MEP Supervisor)
No ratings yet
Shahzad Cv(MEP Supervisor)
2 pages
Sales Proposal
No ratings yet
Sales Proposal
55 pages
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
No ratings yet
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
19 pages
Sampel Demo Test Application - Nabye
No ratings yet
Sampel Demo Test Application - Nabye
1 page
05_Matrix_Multiplication_Properties_9_min
No ratings yet
05_Matrix_Multiplication_Properties_9_min
5 pages
01_Matrices_and_Vectors_9_min
No ratings yet
01_Matrices_and_Vectors_9_min
5 pages
Letter To The Editor Example - Google Search
No ratings yet
Letter To The Editor Example - Google Search
1 page
Goetze 2781 Datasheet en
No ratings yet
Goetze 2781 Datasheet en
4 pages
NCERT-Books-for-class-2-Maths-Chapter-3
No ratings yet
NCERT-Books-for-class-2-Maths-Chapter-3
6 pages
Lecture4a Sub 04-En
No ratings yet
Lecture4a Sub 04-En
3 pages
Chapter 5
No ratings yet
Chapter 5
6 pages
Python Revision Tour - II - Homework
No ratings yet
Python Revision Tour - II - Homework
3 pages
NCERT Books for Class 3 Maths Hind Medium Chapter 9
No ratings yet
NCERT Books for Class 3 Maths Hind Medium Chapter 9
22 pages
Module 4 Lab 2
No ratings yet
Module 4 Lab 2
5 pages
NCERT-Books-for-class-2-Maths-Chapter-5
No ratings yet
NCERT-Books-for-class-2-Maths-Chapter-5
9 pages
02 Python Typing
No ratings yet
02 Python Typing
23 pages
Perceptrons
No ratings yet
Perceptrons
12 pages
NCERT Books for Class 3 Maths Hind Medium Chapter 7
No ratings yet
NCERT Books for Class 3 Maths Hind Medium Chapter 7
18 pages
Mason Industries, Inc.: Type C Ratings
No ratings yet
Mason Industries, Inc.: Type C Ratings
4 pages
Simultaneous Multi-Threaded Design: Virendra Singh
No ratings yet
Simultaneous Multi-Threaded Design: Virendra Singh
15 pages
Gradient Descent Unit3
No ratings yet
Gradient Descent Unit3
9 pages
NCERT Books for Class 3 Maths Hind Medium Chapter 14
No ratings yet
NCERT Books for Class 3 Maths Hind Medium Chapter 14
11 pages
Calculus - class notes
No ratings yet
Calculus - class notes
4 pages
IPTV Gateway Server: Outline
No ratings yet
IPTV Gateway Server: Outline
4 pages
Apex Pro Fundamentals
No ratings yet
Apex Pro Fundamentals
2 pages
Careerwill Current Affairs 2021
No ratings yet
Careerwill Current Affairs 2021
38 pages
Liquidity Heatmap (Nephew - Sam - )
No ratings yet
Liquidity Heatmap (Nephew - Sam - )
7 pages
Su 25
No ratings yet
Su 25
4 pages
20.2 User's Guide: Document Imaging Solutions
No ratings yet
20.2 User's Guide: Document Imaging Solutions
93 pages
Track Busway Metering - Domestic
No ratings yet
Track Busway Metering - Domestic
14 pages
W236: T D H T: A B T: HE Iscrete Ilbert Ransform Rief Utorial
No ratings yet
W236: T D H T: A B T: HE Iscrete Ilbert Ransform Rief Utorial
16 pages
NCERT Books for Class 2 Maths Chapter 10
No ratings yet
NCERT Books for Class 2 Maths Chapter 10
8 pages
Rain Bird Design Guide
No ratings yet
Rain Bird Design Guide
8 pages
05_Gradient_Descent_11_min
No ratings yet
05_Gradient_Descent_11_min
5 pages
UNIT V NNHDL
No ratings yet
UNIT V NNHDL
33 pages
Data Collection Cleaning Preprocessing Presentation
No ratings yet
Data Collection Cleaning Preprocessing Presentation
13 pages
Gradient Descent (v2)
No ratings yet
Gradient Descent (v2)
38 pages
第二章(1)
No ratings yet
第二章(1)
37 pages
NCERT Solutions for Class 5 Maths Chapter 6 Be My Multiple, I’Ll Be Your Factor – Free PDF Download
No ratings yet
NCERT Solutions for Class 5 Maths Chapter 6 Be My Multiple, I’Ll Be Your Factor – Free PDF Download
28 pages
Design Qualification Document for HVAC BMS
No ratings yet
Design Qualification Document for HVAC BMS
21 pages
L07 Optimization
No ratings yet
L07 Optimization
12 pages
GDC72002 - Status NOT RUNNING
No ratings yet
GDC72002 - Status NOT RUNNING
2 pages
06_23ECE216_GradientDescent_v2
No ratings yet
06_23ECE216_GradientDescent_v2
73 pages
07_Gradient_Descent_For_Linear_Regression_10_min
No ratings yet
07_Gradient_Descent_For_Linear_Regression_10_min
5 pages
04 Numerical
No ratings yet
04 Numerical
46 pages
After Activity Report: (Note: Pls Attach A Copy of The Attendance Sheet As MOV)
No ratings yet
After Activity Report: (Note: Pls Attach A Copy of The Attendance Sheet As MOV)
5 pages
Handout Delta Rule
No ratings yet
Handout Delta Rule
10 pages
Em-Cijj190028 424..464
No ratings yet
Em-Cijj190028 424..464
41 pages
DL UNIT-I
No ratings yet
DL UNIT-I
30 pages
Floboss S600+ Flow Computer: S600+ Product Data Sheet
No ratings yet
Floboss S600+ Flow Computer: S600+ Product Data Sheet
11 pages
NCERT-Books-for-class 3-Maths-Chapter 10
No ratings yet
NCERT-Books-for-class 3-Maths-Chapter 10
9 pages
Berkeley-tutorial Optimization for Machine Learningpart2
No ratings yet
Berkeley-tutorial Optimization for Machine Learningpart2
35 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
Options-Greeks-Calculator-Stocks-and-Indices - Trading Tuitions
No ratings yet
Options-Greeks-Calculator-Stocks-and-Indices - Trading Tuitions
7 pages
7
No ratings yet
7
5 pages
NCERT-Books-for-class 3-Maths-Hind-medium-Chapter 5
No ratings yet
NCERT-Books-for-class 3-Maths-Hind-medium-Chapter 5
16 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
SSL Hybrid and HHLL
No ratings yet
SSL Hybrid and HHLL
7 pages
COMBINED-AFSS-AND-WET-STANDPIPE-INSTALLATION-OF-A-PROPOSED-WAREHOUSE
No ratings yet
COMBINED-AFSS-AND-WET-STANDPIPE-INSTALLATION-OF-A-PROPOSED-WAREHOUSE
3 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
09: Neural Networks - Learning: Neural Network Cost Function
No ratings yet
09: Neural Networks - Learning: Neural Network Cost Function
9 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
3 Gradient Descent
No ratings yet
3 Gradient Descent
8 pages
GRADIENT DESCENT
No ratings yet
GRADIENT DESCENT
5 pages
week 06 - Deep Feedforward Networks - Optimization
No ratings yet
week 06 - Deep Feedforward Networks - Optimization
83 pages
gradient-descent-from-scratch-complete-intuition
No ratings yet
gradient-descent-from-scratch-complete-intuition
8 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Week5-LectureNotes
No ratings yet
Week5-LectureNotes
15 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
DL Unit -2
No ratings yet
DL Unit -2
20 pages
4. Gradient Descent
No ratings yet
4. Gradient Descent
15 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Gradient Descent - A Quick, Simple Introduction - Built in
No ratings yet
Gradient Descent - A Quick, Simple Introduction - Built in
15 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Lec 24
No ratings yet
Lec 24
7 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
Leica sp8 Users Manual
No ratings yet
Leica sp8 Users Manual
31 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
MScFE 650 MLF - Video - Transcripts - M3
No ratings yet
MScFE 650 MLF - Video - Transcripts - M3
19 pages
MM 5th Module Notes
No ratings yet
MM 5th Module Notes
43 pages
Machine Learning KNN Based With Support and Resistance
No ratings yet
Machine Learning KNN Based With Support and Resistance
19 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Core Java Cheatsheet
No ratings yet
Core Java Cheatsheet
17 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
12 pages
Over View of CCTV System
No ratings yet
Over View of CCTV System
15 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Deep Learning With Databricks: Srijith Rajamohan, Ph.D. John O'Dwyer
No ratings yet
Deep Learning With Databricks: Srijith Rajamohan, Ph.D. John O'Dwyer
38 pages
Machine Learning - Exploring The Model - Resp
No ratings yet
Machine Learning - Exploring The Model - Resp
18 pages
Pascal Triangle Analogues Introduction
From Everand
Pascal Triangle Analogues Introduction
Tomislav Tomšić
No ratings yet
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
The IF Functions: Easy Excel Essentials, #4
From Everand
The IF Functions: Easy Excel Essentials, #4
M.L. Humphrey
No ratings yet

06_Gradient_Descent_Intuition_12_min

Uploaded by

06_Gradient_Descent_Intuition_12_min

Uploaded by

In the previous video, we gave a

mathematical definition of gradient

You might also like