Gradient Descent
Gradient Descent
• Let’s say you are playing a game where the players are at the top of a
mountain, and they are asked to reach the lowest point of the mountain.
Additionally, they are blindfolded.
• The best way is to observe the ground and find where the land descends.
From that position, take a step in the descending direction and iterate this
process until we reach the lowest point.
• Gradient Descent is an optimization algorithm used for minimizing the
cost function .It is basically used for updating the parameters of the
learning model.
To find the local minimum of a function using gradient descent, we must take steps proportional
to the negative of the gradient (move away from the gradient) of the function at the current point.
If we take steps proportional to the positive of the gradient (moving towards the gradient), we
will approach a local maximum of the function, and the procedure is called Gradient Ascent.
• The goal of the gradient descent algorithm is to minimize the
given function (say cost function). To achieve this goal, it
performs two steps iteratively:
1.Compute the gradient (slope), the first order derivative of
the function at that point
2.Make a step (move) in the direction opposite to the
gradient, opposite direction of slope increase from the
current point by alpha times the gradient at that point
Alpha is called Learning rate – a tuning parameter in the
optimization process. It decides the length of the steps.
Alpha – The Learning Rate
• We have the direction we want to move in, now we must decide the
size of the step we must take.
• If the learning rate is too small, the training might turn out to be too
long
Nature of Learning rate
1.a) Learning rate is optimal, model converges to the minimum
2.b) Learning rate is too small, it takes more time but converges to the
minimum
3.c) Learning rate is higher than the optimal value, it overshoots but
converges ( 1/C < η <2/C)