07_Gradient_Descent_For_Linear_Regression_10_min

This video discusses the integration of gradient descent with the linear regression model to minimize the squared error cost function. It explains the calculation of partial derivatives for the cost function and how these derivatives are used to update parameters in the gradient descent algorithm. The video also introduces the concept of batch gradient descent and mentions alternative methods for solving the cost function, emphasizing the scalability of gradient descent for larger datasets.

Uploaded by

Nabyendu Adhikari

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

07_Gradient_Descent_For_Linear_Regression_10_min

Uploaded by

Nabyendu Adhikari

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

In previous videos, we talked

about the gradient descent algorithm

and talked about the linear
regression model and the squared error cost function.
In this video, we're going to
put together gradient descent with
our cost function, and that
will give us an algorithm for
linear regression for fitting a straight line to our data.
So, this is
what we worked out in the previous videos.
That's our gradient descent algorithm, which
should be familiar, and you
see the linear linear regression model
with our linear hypothesis and our squared error cost function.
What we're going to do is apply
gradient descent to minimize
our squared error cost function.
Now, in order to apply
gradient descent, in order
to write this piece of
code, the key term
we need is this derivative term over here.
So, we need to figure out
what is this partial derivative term,
and plug in the
definition of the cost
function J, this turns
out to be this "inaudible"
equals sum 1 through M of
this squared error
cost function term, and all
I did here was I just
you know plugged in the definition of
the cost function there, and simplifying
little bit more, this turns
out to be equal to, this
"inaudible" equals sum 1 through M
of tetha zero plus tetha one, XI
minus YI squared.
And all I did there was took
the definition for my hypothesis
and plug that in there.
And it turns out we need
to figure out what is
the partial derivative of two
cases for J equals
0 and for J equals 1 want
to figure out what is this
partial derivative for both the
theta(0) case and the theta(1) case.
And I'm just going to write out the answers.
It turns out this first term simplifies
to 1/M, sum over
my training set of just
that, X(i)- Y(i).
And for this term, partial derivative
with respect to theta(1), it turns
out I get this term: -Y(i)<i>X(i).</i>
Okay.
And computing these partial
derivatives, so going from
this equation to either
of these equations down there, computing
those partial derivative terms requires some multivariate calculus.
If you know calculus, feel free
to work through the derivations yourself
and check take the derivatives
you actually get the answers that I got.
But if you are less
familiar with calculus you don't
worry about it, and it
is fine to just take these equations
worked out, and you
won't need to know calculus or
anything like that in order to
do the homework, so to implement gradient descent you'd get that to work.
But so, after these definitions,
or after what we've worked
out to be the derivatives, which
is really just the slope of
the cost function J. We
can now plug them back into
our gradient descent algorithm.
So here's gradient descent, or
the regression, which is going
to repeat until convergence, theta 0
and theta one get updated as,
you know, the same minus alpha
times the derivative term.
So, this term here.
So, here's our linear regression algorithm.
This first term here that
term is, of course, just
a partial derivative of respective
theta zero, that we worked on in the previous slide.
And this second term here,
that term is just
a partial derivative with respect to
theta one that we worked out on the previous line.
And just as a quick reminder,
you must, when implementing gradient descent,
there's actually there's detail that, you
know, you should be implementing
it so the update theta zero and theta one simultaneously.
So, let's see how gradient descent works.
One of the issues we solved
gradient descent is that it can be susceptible to local optima.
So, when I first explained gradient
descent, I showed you this picture
of it, you know, going downhill
on the surface and we
saw how, depending on where
you're initializing, you can end up with different local optima.
You know, you can end up here or here.
But, it turns out that
the cost function for gradient
of cost function for linear regression
is always going to be
a bow-shaped function like this.
The technical term for this
is that this is called a convex function.
And I'm not going
to give the formal definition for what
is a convex function, c-o-n-v-e-x, but
informally a convex function
means a bow-shaped function, you know, kind of like a bow shaped.
And so, this function doesn't
have any local optima, except
for the one global optimum.
And does gradient descent on
this type of cost function which
you get whenever you're using linear
regression, it will always convert
to the global optimum, because there are no other local optima other than global
optimum.
So now, let's see this algorithm in action.
As usual, here are plots of
the hypothesis function and of
my cost function J.
And so, let's see how
to initialize my parameters at this value.
You know, let's say, usually you
initialize your parameters at zero
for zero, theta zero and zero.
For illustration in this
specific presentation, I have
initialised theta zero at
about 900, and theta one at about minus 0.1, okay?
And so, this corresponds to H
over X, equals, you know,
minus 900 minus 0.1 x
is this line, so out here on the cost function.
Now if we take one
step of gradient descent, we end
up going from this point
out here, a little
bit to the down left
to that second point over there.
And, you notice that my line changed a little bit.
And, as I take another step
at gradient descent, my line on the left will change.
Right.
And I have also
moved to a new point on my cost function.
And as I think further step
is gradient descent, I'm going
down in cost, right, so
my parameter is following
this trajectory, and if
you look on the left, this corresponds
to hypotheses that seem
to be getting to be
better and better fits for the
data until eventually,
I have now wound up at the global minimum.
And this global minimum corresponds to
this hypothesis, which gives me a good fit to the data.
And so that's gradient
descent, and we've just run
it and gotten a good
fit to my data set of housing prices.
And you can now use it to predict.
You know, if your friend has a
house with a
size 1250 square feet, you
can now read off the value
and tell them that, I don't
know, maybe they can get
$350,000 for their house.
Finally, just to give
this another name, it turns out
that the algorithm that we
just went over is sometimes
called batch gradient descent.
And it turns out in machine
learning, I feel like us machine
learning people, we're not always
created has given me some algorithms.
But the term batch gradient descent
means that refers to the
fact that, in every step
of gradient descent we're looking
at all of the training examples.
So, in gradient descent, you
know, when computing derivatives, we're computing
these sums, this sum of.
So, in every separate
gradient descent, we end up
computing something like this, that
sums over our M training examples.
And so the term batch gradient
descent refers to the fact
when looking at the entire batch
of training examples, and again,
this is really, really not
a great name, but this is
what Machine Learning people call it.
And it turns out there are
sometimes other versions of
gradient descent that are not
batch versions but instead do
not look at the entire traning set
but look at small subsets
of the training sets at the time,
and we'll talk about those versions later in this course as well.
But for now, using the algorithm
you just learned, now we're
using batch gradient descent, you
now know how to implement
gradient descent, or linear regression.
So that's linear regression with gradient descent.
If you've seen advanced linear algebra
before so some you may
have taken a class with advanced
linear algebra, you might
know that there exists a solution
for numerically solving for the
minimum of the cost function
J, without needing to
use and iterative algorithm like gradient descent.
Later in this course we will
talk about that method as
well that just solves for the
minimum cost function J without
needing this multiple steps of gradient descent.
That other method is called normal equations methods.
And, but in case you
have heard of that method, it turns
out gradient descent will
scale better to larger data
sets than that normal equals
method and, now that
we know about gradient descent, we'll
be able to use it in
lots of different contexts, and we'll
use it in lots of different Machine Learning problems as well.
So, congrats on learning
about your first Machine Learning algorithm.
We'll later have exercises in
which we'll ask you to
implement gradient descent and
hopefully see these algorithms work for yourselves.
But before that I first
want to tell you in
the next set of videos, the
first want to tell you about
a generalization of the gradient descent
algorithm that will make
it much more powerful and I
guess I will tell you about that in the next video.

Machine Learning KNN Based With Support and Resistance
No ratings yet
Machine Learning KNN Based With Support and Resistance
19 pages
TI-89 Graphing Calculator For Dummies
100% (2)
TI-89 Graphing Calculator For Dummies
363 pages
Machine Learning - Exploring The Model - Resp
No ratings yet
Machine Learning - Exploring The Model - Resp
18 pages
Linear Regression With Multiple Variables
No ratings yet
Linear Regression With Multiple Variables
37 pages
02_Cost_Function_8_min
No ratings yet
02_Cost_Function_8_min
3 pages
Lec 18
No ratings yet
Lec 18
6 pages
05_Gradient_Descent_11_min
No ratings yet
05_Gradient_Descent_11_min
5 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
gradient-descent-from-scratch-complete-intuition
No ratings yet
gradient-descent-from-scratch-complete-intuition
8 pages
06_Gradient_Descent_Intuition_12_min
No ratings yet
06_Gradient_Descent_Intuition_12_min
5 pages
28.14 - Code Sample - mp4
No ratings yet
28.14 - Code Sample - mp4
4 pages
Transforms of Derivatives and Integrals
No ratings yet
Transforms of Derivatives and Integrals
10 pages
38.9 - Hyperparameter Tuning - mp4
No ratings yet
38.9 - Hyperparameter Tuning - mp4
2 pages
Lec 9
No ratings yet
Lec 9
20 pages
Lecture 4
No ratings yet
Lecture 4
101 pages
lec31 (7)
No ratings yet
lec31 (7)
30 pages
d60d7cc837f222bb92e5982efdf3823c_TDo3r5M1LNo
No ratings yet
d60d7cc837f222bb92e5982efdf3823c_TDo3r5M1LNo
14 pages
03_Cost_Function_-_Intuition_I_11_min
No ratings yet
03_Cost_Function_-_Intuition_I_11_min
4 pages
Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology Kharagpur Lecture - 1 Optimization-Introduction
No ratings yet
Optimization Prof. A. Goswami Department of Mathematics Indian Institute of Technology Kharagpur Lecture - 1 Optimization-Introduction
20 pages
lec24
No ratings yet
lec24
3 pages
Advanced Finite Element Analysis Prof. R. Krishnakumar Department of Mechanical Engineering Indian Institute of Technology, Madras
No ratings yet
Advanced Finite Element Analysis Prof. R. Krishnakumar Department of Mechanical Engineering Indian Institute of Technology, Madras
22 pages
Lec 1
No ratings yet
Lec 1
9 pages
Optimization For Data Science
No ratings yet
Optimization For Data Science
18 pages
Numerical Methods II - Polynomial Approximation
No ratings yet
Numerical Methods II - Polynomial Approximation
91 pages
lec34
No ratings yet
lec34
9 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Linear Programming
No ratings yet
Linear Programming
209 pages
Advanced Linear Programming
100% (1)
Advanced Linear Programming
209 pages
Yousef ML Washin Regression
No ratings yet
Yousef ML Washin Regression
590 pages
Linear Regression in python
No ratings yet
Linear Regression in python
9 pages
Gradient Descent Example PDF
No ratings yet
Gradient Descent Example PDF
3 pages
MITOCW - MITRES2 - 002s10linear - Lec12 - 300k-mp4: Professor
No ratings yet
MITOCW - MITRES2 - 002s10linear - Lec12 - 300k-mp4: Professor
21 pages
XBFXFB
No ratings yet
XBFXFB
6 pages
Lec19 PDF
No ratings yet
Lec19 PDF
22 pages
DL UNIT 2
No ratings yet
DL UNIT 2
46 pages
Lec 14
No ratings yet
Lec 14
10 pages
Polynomial Curve Fitting in Machine Learning
No ratings yet
Polynomial Curve Fitting in Machine Learning
7 pages
Week012 Module
No ratings yet
Week012 Module
12 pages
lec30 (6)
No ratings yet
lec30 (6)
22 pages
Gradient Descent - A Quick, Simple Introduction - Built in
No ratings yet
Gradient Descent - A Quick, Simple Introduction - Built in
15 pages
operators-720p-en
No ratings yet
operators-720p-en
5 pages
Lec 37
No ratings yet
Lec 37
12 pages
Gradient Descent Unit3
No ratings yet
Gradient Descent Unit3
9 pages
Lec 6
No ratings yet
Lec 6
23 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Lec 33
No ratings yet
Lec 33
20 pages
Lec 13
No ratings yet
Lec 13
10 pages
MITOCW - MITRES2 - 002s10nonlinear - Lec12 - 300k-mp4: Narrator
No ratings yet
MITOCW - MITRES2 - 002s10nonlinear - Lec12 - 300k-mp4: Narrator
16 pages
12 Recursion
No ratings yet
12 Recursion
4 pages
線控911
No ratings yet
線控911
14 pages
lec42
No ratings yet
lec42
12 pages
08_Whats_Next_6_min
No ratings yet
08_Whats_Next_6_min
3 pages
Lec20 PDF
No ratings yet
Lec20 PDF
24 pages
Topic - Minima and Maxima
No ratings yet
Topic - Minima and Maxima
11 pages
Concept of Algorithm
No ratings yet
Concept of Algorithm
40 pages
Algorithm Types and Classification
No ratings yet
Algorithm Types and Classification
5 pages
Lec 6
No ratings yet
Lec 6
16 pages
Calculus 223017512
No ratings yet
Calculus 223017512
13 pages
MachineLearning Lecture10 PDF
No ratings yet
MachineLearning Lecture10 PDF
17 pages
Automatic Differentiation With Scala: Our Example Problem
No ratings yet
Automatic Differentiation With Scala: Our Example Problem
9 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
05_Matrix_Multiplication_Properties_9_min
No ratings yet
05_Matrix_Multiplication_Properties_9_min
5 pages
01_Matrices_and_Vectors_9_min
No ratings yet
01_Matrices_and_Vectors_9_min
5 pages
NCERT-Books-for-class-2-Maths-Chapter-4
No ratings yet
NCERT-Books-for-class-2-Maths-Chapter-4
6 pages
NCERT-Books-for-class 3-Maths-Hind-medium-Chapter 5
No ratings yet
NCERT-Books-for-class 3-Maths-Hind-medium-Chapter 5
16 pages
NCERT-Books-for-class 3-Maths-Chapter 10
No ratings yet
NCERT-Books-for-class 3-Maths-Chapter 10
9 pages
NCERT-Books-for-class-2-Maths-Chapter-3
No ratings yet
NCERT-Books-for-class-2-Maths-Chapter-3
6 pages
NCERT-Books-for-class-2-Maths-Chapter-5
No ratings yet
NCERT-Books-for-class-2-Maths-Chapter-5
9 pages
Sampel Demo Test Application - Nabye
No ratings yet
Sampel Demo Test Application - Nabye
1 page
Options-Greeks-Calculator-Stocks-and-Indices - Trading Tuitions
No ratings yet
Options-Greeks-Calculator-Stocks-and-Indices - Trading Tuitions
7 pages
SSL Hybrid and HHLL
No ratings yet
SSL Hybrid and HHLL
7 pages
Check Up Test On Descriptive Statistics
No ratings yet
Check Up Test On Descriptive Statistics
2 pages
Punit Resume Work
No ratings yet
Punit Resume Work
3 pages
Evermotion Archmodels Vol 4 PDF
No ratings yet
Evermotion Archmodels Vol 4 PDF
2 pages
Earth Leakage Relays Elrm44v-1-A
No ratings yet
Earth Leakage Relays Elrm44v-1-A
1 page
PIC24FJ64GA002 Interrupt Vectors
No ratings yet
PIC24FJ64GA002 Interrupt Vectors
10 pages
Quiz on Acctsys
No ratings yet
Quiz on Acctsys
2 pages
Pro Cast
No ratings yet
Pro Cast
16 pages
BEC - APPLICATION FORM 2024 - Trần Đỗ Ly Na
No ratings yet
BEC - APPLICATION FORM 2024 - Trần Đỗ Ly Na
10 pages
Mytimeandexpenses Time Report: Summary by Assignment Prior Period Adjustment Summary
No ratings yet
Mytimeandexpenses Time Report: Summary by Assignment Prior Period Adjustment Summary
2 pages
Failure Mode and Effects Analysis (Process Fmea)
No ratings yet
Failure Mode and Effects Analysis (Process Fmea)
1 page
Daniel D Gajski Nikil D Dutt Allen C-H Wu Steve Y-L Lin Auth - High Level Synthesis - Introduction To Chip and System Design-Springer US 1992
No ratings yet
Daniel D Gajski Nikil D Dutt Allen C-H Wu Steve Y-L Lin Auth - High Level Synthesis - Introduction To Chip and System Design-Springer US 1992
367 pages
DVR
100% (1)
DVR
18 pages
SIP5 7VE85 V07.82 Manual C071-2 en
No ratings yet
SIP5 7VE85 V07.82 Manual C071-2 en
866 pages
Python Test 1
No ratings yet
Python Test 1
4 pages
Database Architecture
No ratings yet
Database Architecture
6 pages
Salarvand 100bi
No ratings yet
Salarvand 100bi
7 pages
(Synthesis Lectures On Control and Mechatronics) Mathukumalli Vidyasagar-Control System Synthesis - A Factorization Approach, Part II (Synthesis Lectures On Control and Mechatronics) - Morgan & Claypo
No ratings yet
(Synthesis Lectures On Control and Mechatronics) Mathukumalli Vidyasagar-Control System Synthesis - A Factorization Approach, Part II (Synthesis Lectures On Control and Mechatronics) - Morgan & Claypo
227 pages
Fundamentals of Programming in Java
No ratings yet
Fundamentals of Programming in Java
11 pages
A Comparison of Four BRDF Models
No ratings yet
A Comparison of Four BRDF Models
11 pages
Theriser UserGuide v1.0
No ratings yet
Theriser UserGuide v1.0
42 pages
DTE (22320) Chapter 4 Notes
No ratings yet
DTE (22320) Chapter 4 Notes
36 pages
DX300 DX600 Manual
No ratings yet
DX300 DX600 Manual
39 pages
Cobol Latest-2
No ratings yet
Cobol Latest-2
299 pages
Optical Designs For Fundus Imaging: From Traditional Desktop
50% (2)
Optical Designs For Fundus Imaging: From Traditional Desktop
93 pages
Chapter 4 Describe Features of Natural Language Processing (NLP) Workloads On Azure - Exam Ref AI-900 Microsoft Azure AI Fundamentals
No ratings yet
Chapter 4 Describe Features of Natural Language Processing (NLP) Workloads On Azure - Exam Ref AI-900 Microsoft Azure AI Fundamentals
39 pages
Qualification: Master of Science (Computer Science) Career Objectives
No ratings yet
Qualification: Master of Science (Computer Science) Career Objectives
3 pages
Subject Class Test/Exam Syllabus To Be Covered in The Examination
No ratings yet
Subject Class Test/Exam Syllabus To Be Covered in The Examination
6 pages
Notification Staff Assistant 2
No ratings yet
Notification Staff Assistant 2
20 pages
Euro Servo MESIN TARIK PDF
No ratings yet
Euro Servo MESIN TARIK PDF
353 pages

07_Gradient_Descent_For_Linear_Regression_10_min

Uploaded by

07_Gradient_Descent_For_Linear_Regression_10_min

Uploaded by

In previous videos, we talked

about the gradient descent algorithm

You might also like