Gradient Descent
Gradient Descent
1
Gradient descent
•Contents
• 1. What is gradient descent
2
What is Gradient descent
• It is an optimization algorithm that is used in machine learning
and deep learning to minimize the cost function.
3
What is Gradient descent
Gradient descent is like navigating through a foggy landscape to find the
lowest point
4
What is Gradient descent
5
What is Gradient descent
• Gradient descent mathematically
Predicted value =intercept +slope * weight
cost = actual – predicted
Height
keep track of squared cost
Weight
6
What is Gradient descent
• Gradient descent mathematically
new w = w – alpha d/dw j(w,b)
Weight
new b= b – alpha d/db j(w,b)
repeat until convergence
bias
7
What is Gradient descent
• Gradient descent mathematically
Sum of squared
so we can take derivative of this function and
determine slope at any value for intercept.
Step size = slope *learning rate
cost
New intercept = old intercept- step size
Repeat until convergence
intercept
8
What is Gradient descent
9
Gradient Descent Types
• Batch Gradient Descent
we use the entire dataset in in each iterate to do our
computations.
• Pros
-results for the model are more accurate.
• Cons
-very slow due to high computational time.
10
Gradient Descent Types
• Stochastic Gradient Descent
we use only one training example from our dataset in each
iterate to do our computations.
• Pros
-very fast.
• Cons
-results are not very accurate.
11
Gradient Descent Types
• Mini-Batch Gradient Descent
we use the batch from dataset in in each iterate to do our
computations.
• Pros
-it gives us a mixed results between the above types since it can be
faster than batch gradient descent and it can output more accurate
results than SGD
12
Challenges of gradient descent
Vanishing and exploding gradient
• Vanishing gradients : occurs when gradients become too small during
backpropagation. The weights of the network are not considerably changed as a
result, and the network is unable to discover the underlying patterns in the data.
• Exploding gradients : occurs when gradients become too large during
backpropagation. When this happens, the weights are updated by a large amount,
which can cause the network to diverge
13
Challenges of gradient descent
• Time Consuming
14