05_Gradient_Descent_11_min

The document discusses the gradient descent algorithm, which is used to minimize a cost function J in machine learning, particularly in linear regression. It explains how gradient descent iteratively updates parameters (theta0, theta1) to find local minima by taking steps in the direction of the steepest descent. The document also emphasizes the importance of simultaneous updates for accuracy in the implementation of the algorithm.

Uploaded by

Nabyendu Adhikari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

05_Gradient_Descent_11_min

Uploaded by

Nabyendu Adhikari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

We've previously defined the cost function

J. In this video I want to tell you about

an algorithm called gradient descent for
minimizing the cost function J. It turns
out gradient descent is a more general
algorithm and is used not only in linear
regression. It's actually used all over
the place in machine learning. And later
in the class we'll use gradient descent to
minimize other functions as well, not just
the cost function J, for linear regression.
So in this video, I'm going to
talk about gradient descent for minimizing
some arbitrary function J. And then in
later videos, we'll take those algorithm
and apply it specifically to the cost
function J that we had to find for linear
regression. So here's the problem
setup. We're going to see that we have
some function J of (theta0, theta1).
Maybe it's a cost function from linear
regression. Maybe it's some other function
we want to minimize. And we want
to come up with an algorithm for
minimizing that as a function of J of
(theta0, theta1). Just as an aside,
it turns out that gradient descent
actually applies to more general
functions. So imagine if you have
a function that's a function of
J of (theta0, theta1, theta2, up to
some theta n). And you want to
minimize over (theta0 up to theta n)
of this J of (theta0 up to theta n).
It turns out gradient descent
is an algorithm for solving
this more general problem, but for the
sake of brevity, for the sake of
your succinctness of notation, I'm just
going to present only two parameters
throughout the rest of this video. Here's
the idea for gradient descent. What we're
going to do is we're going to start off
with some initial guesses for theta0 and
theta1. Doesn't really matter what they
are, but a common choice would be if we
set theta0 to 0, and
set theta1 to 0. Just initialize
them to 0. What we're going to do in
gradient descent is we'll keep changing
theta0 and theta1 a little bit to
try to reduce J of (theta0, theta1)
until hopefully we wind up at a minimum or
maybe a local minimum. So, let's see
see pictures of what gradient descent
does. Let's say I try to minimize this
function. So notice the axes. This is,
(theta0, theta1) on the horizontal
axes, and J is a vertical axis. And so the
height of the surface shows J, and we
want to minimize this function. So, we're
going to start off with (theta0, theta1)
at some point. So imagine picking some
value for (theta0, theta1), and that
corresponds to starting at some point on
the surface of this function. Okay? So
whatever value of (theta0, theta1)
gives you some point here. I did
initialize them to 0, but
sometimes you initialize it to other
values as well. Now. I want us to imagine
that this figure shows a hill. Imagine
this is like a landscape of some grassy
park with two hills like so.
And I want you to imagine that you are
physically standing at that point on the
hill on this little red hill in your park.
In gradient descent, what we're
going to do is spin 360 degrees around and
just look all around us and ask, "If I were
to take a little baby step in some
direction, and I want to go downhill as
quickly as possible, what direction do I
take that little baby step in if I want to
go down, if I sort of want to physically
walk down this hill as rapidly as
possible?" Turns out that if we're standing
at that point on the hill, you look all
around, you find that the best direction
to take a little step downhill
is roughly that direction. Okay. And now
you're at this new point on your hill.
You're going to, again, look all around, and then
say, "What direction should I step in order
to take a little baby step downhill?" And
if you do that and take another step, you
take a step in that direction, and then
you keep going. You know, from this new
point, you look around, decide what
direction will take you downhill most
quickly, take another step, another step,
and so on, until you converge to this,
local minimum down here. Further descent
has an interesting property. This first
time we ran gradient descent, we were
starting at this point over here, right?
Started at that point over here. Now
imagine, we initialize gradient
descent just a couple steps to the right.
Imagine we initialized gradient descent with
that point on the upper right. If you were
to repeat this process, and stop at the
point, and look all around. Take a little
step in the direction of steepest descent.
You would do that. Then look around, take
another step, and so on. And if you start
it just a couple steps to the right, the
gradient descent will have taken you to
this second local optimum over on the
right. So if you had started at this first
point, you would have wound up at this
local optimum. But if you started just a
little bit, a slightly different location,
you would have wound up at a very
different local optimum. And this is a
property of gradient descent that we'll
say a little bit more about later. So,
that's the intuition in pictures. Let's
look at the map. This is the definition of
the gradient descent algorithm. We're
going to just repeatedly do this. On to
convergence. We're going to update my
parameter theta j by, you know, taking
theta j and subtracting from it alpha
times this term over here. So, let's
see. There are a lot of details in this
equation, so let me unpack some of it.
First, this notation here, colon equals.
We're going to use := to denote
assignment, so it's the assignment
operator. So concretely, if I
write A: =B, what this means in
a computer, this means take the
value in B and use it to overwrite
whatever the value of A. So this means we
will set A to be equal to the value of B.
Okay, it's assignment. I can
also do A:=A+1. This means
take A and increase its value by one.
Whereas in contrast, if I use the equals
sign and I write A=B, then this is a
truth assertion. So if I write A=B,
then I'm asserting that the value of A
equals to the value of B. So the left hand
side, that's a computer operation, where
you set the value of A to be a value. The
right hand side, this is asserting, I'm
just making a claim that the values of A
and B are the same. And so, whereas I can
write A: =A+1, that means increment A by
1. Hopefully, I won't ever write A=A+1.
Because that's just wrong.
A and A+1 can never be equal to
the same values. So that's the first
part of the definition. This alpha
here is a number that is called the
learning rate. And what alpha does is, it
basically controls how big a step we take
downhill with gradient descent. So if
alpha is very large, then that corresponds
to a very aggressive gradient descent
procedure, where we're trying to take huge
steps downhill. And if alpha is very
small, then we're taking little, little
baby steps downhill. And, I'll come
back and say more about this later.
About how to set alpha and so on.
And finally, this term here. That's the
derivative term, I don't want to talk
about it right now, but I will derive this
derivative term and tell you exactly what
this is based on. And some of you
will be more familiar with calculus than
others, but even if you aren't familiar
with calculus don't worry about it, I'll
tell you what you need to know about this
term here. Now there's one more subtlety
about gradient descent which is, in
gradient descent, we're going to
update theta0 and theta1. So
this update takes place where j=0, and
j=1. So you're going to update j, theta0,
and update theta1. And the subtlety of
how you implement gradient descent is,
for this expression, for this
update equation, you want to
simultaneously update theta0 and
theta1. What I mean by that is
that in this equation,
we're going to update
theta0:=theta0 - something, and update
theta1:=theta1 - something.
And the way to implement this is,
you should compute the right hand
side. Compute that thing for both theta0
and theta1, and then simultaneously at
the same time update theta0 and
theta1. So let me say what I mean
by that. This is a correct implementation
of gradient descent meaning simultaneous
updates. I'm going to set temp0 equals
that, set temp1 equals that. So basically
compute the right hand sides. And then having
computed the right hand sides and stored
them together in temp0 and temp1,
I'm going to update theta0 and theta1
simultaneously, because that's the
correct implementation. In contrast,
here's an incorrect implementation that
does not do a simultaneous update. So in
this incorrect implementation, we compute
temp0, and then we update theta0
and then we compute temp1. Then we
update temp1. And the difference between
the right hand side and the left hand side
implementations is that if we look down
here, you look at this step, if by this
time you've already updated theta0
then you would be using the new
value of theta0 to compute this
derivative term and so this gives you a
different value of temp1 than the left
hand side, because you've now
plugged in the new value of theta0
into this equation. And so this on right
hand side is not a correct implementation
of gradient descent. So I don't
want to say why you need to do the
simultaneous updates, it turns
out that the way gradient descent
is usually implemented, we'll say more
about it later, it actually turns out to
be more natural to implement the
simultaneous update. And when people talk
about gradient descent, they always mean
simultaneous update. If you implement the
non-simultaneous update, it turns out
it will probably work anyway, but this
algorithm on the right is not what people
people refer to as gradient descent and
this is some other algorithm with
different properties. And for various
reasons, this can behave in
slightly stranger ways. And
what you should do is to really
implement the simultaneous update of
gradient descent. So, that's the outline of the
gradient descent algorithm. In the next video,
we're going to go into the details of the
derivative term, which I wrote out but
didn't really define. And if you've taken
a calculus class before and if you're
familiar with partial derivatives and
derivatives, it turns out that's exactly
what that derivative term is. But in case
you aren't familiar with calculus, don't
worry about it. The next video will give
you all the intuitions and will tell you
everything you need to know to compute
that derivative term, even if you haven't
seen calculus, or even if you haven't seen
partial derivatives before. And with
that, with the next video, hopefully,
we'll be able to give all the intuitions
you need to apply gradient descent.

Machine Learning KNN Based With Support and Resistance
No ratings yet
Machine Learning KNN Based With Support and Resistance
19 pages
Elliott Wave Timing Beyond Ordinary Fibonacci Methods
From Everand
Elliott Wave Timing Beyond Ordinary Fibonacci Methods
Mark Lytle
4/5 (23)
AZ-204T00A-ENU-TrainerHandbook Libro 4
100% (1)
AZ-204T00A-ENU-TrainerHandbook Libro 4
381 pages
Install Elastix 4 On CentOS (VPS)
No ratings yet
Install Elastix 4 On CentOS (VPS)
4 pages
PA Bpro Contract 4400023325 Sure System
No ratings yet
PA Bpro Contract 4400023325 Sure System
811 pages
Machine Learning - Exploring The Model - Resp
No ratings yet
Machine Learning - Exploring The Model - Resp
18 pages
06_Gradient_Descent_Intuition_12_min
No ratings yet
06_Gradient_Descent_Intuition_12_min
5 pages
07_Gradient_Descent_For_Linear_Regression_10_min
No ratings yet
07_Gradient_Descent_For_Linear_Regression_10_min
5 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Lecture4a Sub 04-En
No ratings yet
Lecture4a Sub 04-En
3 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
3 Gradient Descent
No ratings yet
3 Gradient Descent
8 pages
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
No ratings yet
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
19 pages
MScFE 650 MLF - Video - Transcripts - M3
No ratings yet
MScFE 650 MLF - Video - Transcripts - M3
19 pages
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
No ratings yet
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
38 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
gradient-descent-from-scratch-complete-intuition
No ratings yet
gradient-descent-from-scratch-complete-intuition
8 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
12 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
8 pages
Chapter 5
No ratings yet
Chapter 5
6 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
06_23ECE216_GradientDescent_v2
No ratings yet
06_23ECE216_GradientDescent_v2
73 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
23 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
4. Gradient Descent
No ratings yet
4. Gradient Descent
15 pages
week 10 notes MLF
No ratings yet
week 10 notes MLF
20 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
Gradient Descent (v2)
No ratings yet
Gradient Descent (v2)
38 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
7
No ratings yet
7
5 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
L07 Optimization
No ratings yet
L07 Optimization
12 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
DL Unit -2
No ratings yet
DL Unit -2
20 pages
Lecture02a Optimization Annotated PDF
No ratings yet
Lecture02a Optimization Annotated PDF
23 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
Math YHPLinear Regression
No ratings yet
Math YHPLinear Regression
13 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Gradient Descent
No ratings yet
Gradient Descent
58 pages
Gradient Descent - A Quick, Simple Introduction - Built in
No ratings yet
Gradient Descent - A Quick, Simple Introduction - Built in
15 pages
Perceptrons
No ratings yet
Perceptrons
12 pages
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
No ratings yet
Lecture Notes: Some Notes On Gradient Descent: Marc Toussaint
4 pages
第二章(1)
No ratings yet
第二章(1)
37 pages
Lec_11
No ratings yet
Lec_11
13 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Berkeley-tutorial Optimization for Machine Learningpart2
No ratings yet
Berkeley-tutorial Optimization for Machine Learningpart2
35 pages
Linear Regression- Gradient Descent Method
No ratings yet
Linear Regression- Gradient Descent Method
15 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
AI33
No ratings yet
AI33
6 pages
Handout Delta Rule
No ratings yet
Handout Delta Rule
10 pages
Chap_4_2
No ratings yet
Chap_4_2
214 pages
Gradient Descent Unit3
No ratings yet
Gradient Descent Unit3
9 pages
Optimization2
No ratings yet
Optimization2
40 pages
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
NCERT Books for Class 2 Maths Chapter 10
No ratings yet
NCERT Books for Class 2 Maths Chapter 10
8 pages
NCERT Books for Class 3 Maths Hind Medium Chapter 9
No ratings yet
NCERT Books for Class 3 Maths Hind Medium Chapter 9
22 pages
NCERT Books for Class 3 Maths Hind Medium Chapter 7
No ratings yet
NCERT Books for Class 3 Maths Hind Medium Chapter 7
18 pages
NCERT Books for Class 3 Maths Hind Medium Chapter 14
No ratings yet
NCERT Books for Class 3 Maths Hind Medium Chapter 14
11 pages
05_Matrix_Multiplication_Properties_9_min
No ratings yet
05_Matrix_Multiplication_Properties_9_min
5 pages
01_Matrices_and_Vectors_9_min
No ratings yet
01_Matrices_and_Vectors_9_min
5 pages
NCERT-Books-for-class 3-Maths-Hind-medium-Chapter 5
No ratings yet
NCERT-Books-for-class 3-Maths-Hind-medium-Chapter 5
16 pages
NCERT-Books-for-class 3-Maths-Chapter 10
No ratings yet
NCERT-Books-for-class 3-Maths-Chapter 10
9 pages
NCERT-Books-for-class-2-Maths-Chapter-3
No ratings yet
NCERT-Books-for-class-2-Maths-Chapter-3
6 pages
NCERT-Books-for-class-2-Maths-Chapter-5
No ratings yet
NCERT-Books-for-class-2-Maths-Chapter-5
9 pages
Sampel Demo Test Application - Nabye
No ratings yet
Sampel Demo Test Application - Nabye
1 page
NCERT-Books-for-class-2-Maths-Chapter-4
No ratings yet
NCERT-Books-for-class-2-Maths-Chapter-4
6 pages
Options-Greeks-Calculator-Stocks-and-Indices - Trading Tuitions
No ratings yet
Options-Greeks-Calculator-Stocks-and-Indices - Trading Tuitions
7 pages
SSL Hybrid and HHLL
No ratings yet
SSL Hybrid and HHLL
7 pages
Admet Catalogue 2600-Series-15
No ratings yet
Admet Catalogue 2600-Series-15
1 page
BIOS - Basic Input Output System
No ratings yet
BIOS - Basic Input Output System
3 pages
Q1) Issues in Designing Distributed Systems:: 1. Heterogeneity
No ratings yet
Q1) Issues in Designing Distributed Systems:: 1. Heterogeneity
3 pages
APPLE INC.'S Corporate Culture: The Good, The Bad and The Ugly
No ratings yet
APPLE INC.'S Corporate Culture: The Good, The Bad and The Ugly
16 pages
FULLTEXT01
No ratings yet
FULLTEXT01
14 pages
Nurses Info - Villa, Paula Marie, BSN 2
No ratings yet
Nurses Info - Villa, Paula Marie, BSN 2
19 pages
gc_2025_01_22
No ratings yet
gc_2025_01_22
21 pages
6) TCE MOOC-jLinear Regression
No ratings yet
6) TCE MOOC-jLinear Regression
19 pages
Deepak Bellara
No ratings yet
Deepak Bellara
16 pages
KL 002.11 Eng Student Guide v1.0 W PDF
No ratings yet
KL 002.11 Eng Student Guide v1.0 W PDF
355 pages
Design analysis - Nintendo 3DS
No ratings yet
Design analysis - Nintendo 3DS
16 pages
N1500i FACS: Inverter Stud Welder
No ratings yet
N1500i FACS: Inverter Stud Welder
1 page
T-Mu-Md-00009-F2-V2.0 - ISA AEO Self-Assessment Checklist
No ratings yet
T-Mu-Md-00009-F2-V2.0 - ISA AEO Self-Assessment Checklist
54 pages
Mageworx SEO Configuration Guidlines: Meta Robots Settings
No ratings yet
Mageworx SEO Configuration Guidlines: Meta Robots Settings
8 pages
8096 Microcontrollers Notes
0% (1)
8096 Microcontrollers Notes
9 pages
Devaraya Swamikal'S Sri Skanda Kavacams (In Tamil Script, Tscii Format)
No ratings yet
Devaraya Swamikal'S Sri Skanda Kavacams (In Tamil Script, Tscii Format)
14 pages
UNIT - 9 Virtual Reality
No ratings yet
UNIT - 9 Virtual Reality
20 pages
The Scopes and Sequences For Grade 4
No ratings yet
The Scopes and Sequences For Grade 4
3 pages
Dsa PPT (Suraj Yadav) 202310101110044
No ratings yet
Dsa PPT (Suraj Yadav) 202310101110044
22 pages
Nic LD PDF
No ratings yet
Nic LD PDF
349 pages
?top 100 Product - Based Companies in India ?
No ratings yet
?top 100 Product - Based Companies in India ?
40 pages
MJoC_Full_Paper_Template_and_copyright_form_Feb_2024
No ratings yet
MJoC_Full_Paper_Template_and_copyright_form_Feb_2024
7 pages
SGT5 8000H Experimental Test Results and
No ratings yet
SGT5 8000H Experimental Test Results and
10 pages
Autoauth Web Portal User Guide: December 6, 2018
No ratings yet
Autoauth Web Portal User Guide: December 6, 2018
19 pages
Numerical Solutions of Algebraic and Transcendental Equations
No ratings yet
Numerical Solutions of Algebraic and Transcendental Equations
22 pages
HW 3
No ratings yet
HW 3
3 pages
DBMS3
No ratings yet
DBMS3
42 pages

05_Gradient_Descent_11_min

Uploaded by

05_Gradient_Descent_11_min

Uploaded by

We've previously defined the cost function

J. In this video I want to tell you about

You might also like