0% found this document useful (0 votes)
26 views9 pages

Gradient Descent

Uploaded by

Anshu Goud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views9 pages

Gradient Descent

Uploaded by

Anshu Goud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

What is Gradient Descent?

• Let’s say you are playing a game where the players are at the top of a
mountain, and they are asked to reach the lowest point of the mountain.
Additionally, they are blindfolded.

• The best way is to observe the ground and find where the land descends.
From that position, take a step in the descending direction and iterate this
process until we reach the lowest point.
• Gradient Descent is an optimization algorithm used for minimizing the
cost function .It is basically used for updating the parameters of the
learning model.

• What is a Cost Function?


• It is a function that measures the performance of a model for any given
data. Cost Function quantifies the error between predicted values and
expected values and presents it in the form of a single real number.
• After making a hypothesis with initial parameters, we calculate the Cost
function. And with a goal to reduce the cost function, we modify the
parameters by using the Gradient descent algorithm over the given data.
Here’s the mathematical representation for it:
Gradient descent is an iterative optimization algorithm for finding the local minimum of a
function.

To find the local minimum of a function using gradient descent, we must take steps proportional
to the negative of the gradient (move away from the gradient) of the function at the current point.

If we take steps proportional to the positive of the gradient (moving towards the gradient), we
will approach a local maximum of the function, and the procedure is called Gradient Ascent.
• The goal of the gradient descent algorithm is to minimize the
given function (say cost function). To achieve this goal, it
performs two steps iteratively:
1.Compute the gradient (slope), the first order derivative of
the function at that point
2.Make a step (move) in the direction opposite to the
gradient, opposite direction of slope increase from the
current point by alpha times the gradient at that point
Alpha is called Learning rate – a tuning parameter in the
optimization process. It decides the length of the steps.
Alpha – The Learning Rate

• We have the direction we want to move in, now we must decide the
size of the step we must take.

• *It must be chosen carefully to end up with local minima.

• If the learning rate is too high, we might OVERSHOOT the minima


and keep bouncing, without reaching the minima

• If the learning rate is too small, the training might turn out to be too
long
Nature of Learning rate
1.a) Learning rate is optimal, model converges to the minimum

2.b) Learning rate is too small, it takes more time but converges to the
minimum

3.c) Learning rate is higher than the optimal value, it overshoots but
converges ( 1/C < η <2/C)

4.d) Learning rate is very large, it overshoots and diverges, moves


away from the minima, performance decreases on learning
Minibatch and Stochastic Gradient Descent

1.Batch Gradient Descent: Parameters are updated after computing


the gradient of the error with respect to the entire training set

2.Stochastic Gradient Descent: Parameters are updated after


computing the gradient of the error with respect to a single training
example

3.Mini-Batch Gradient Descent: Parameters are updated after


computing the gradient of the error with respect to a subset of the
training set

You might also like