0% found this document useful (0 votes)

1 views

chp2 Gradient Descent algorithm

Gradient descent is an optimization algorithm used to minimize the loss function in machine learning models by iteratively adjusting weights in the direction of the negative gradient. Variants include Batch Gradient Descent, Stochastic Gradient Descent, Mini-Batch Gradient Descent, and Momentum, each addressing different computational challenges and stability issues. The learning rate is a crucial hyperparameter that influences convergence speed and stability, with careful selection necessary to avoid overshooting or slow convergence.

Uploaded by

SS19ME014 Sanket vijay kadam

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

chp2 Gradient Descent algorithm

Uploaded by

SS19ME014 Sanket vijay kadam

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Gradient Descent algorithm

Gradient descent is a powerful optimization algorithm used to minimize the

loss function in a machine learning model. It’s a popular choice for a variety
of algorithms, including linear regression, logistic regression, and neural
networks. In this article, we’ll cover what gradient descent is, how it works,
and several variants of the algorithm that are designed to address different
challenges and provide optimizations for different use cases.

What is Gradient Descent?

Gradient descent is an optimization algorithm that is used to minimize the
loss function in a machine learning model. The goal of gradient descent is to
find the set of weights (or coefficients) that minimize the loss function. The
algorithm works by iteratively adjusting the weights in the direction of the
steepest decrease in the loss function.

How does Gradient Descent Work?

The basic idea of gradient descent is to start with an initial set of weights and
update them in the direction of the negative gradient of the loss function. The
gradient is a vector of partial derivatives that represents the rate of change of
the loss function with respect to the weights. By updating the weights in the
direction of the negative gradient, the algorithm moves towards a minimum of
the loss function.

The learning rate is a hyperparameter that determines the size of the step
taken in the weight update. A small learning rate results in a slow
convergence, while a large learning rate can lead to overshooting the
minimum and oscillating around the minimum. It’s important to choose an
appropriate learning rate that balances the speed of convergence and the
stability of the optimization.
Variants of Gradient Descent

1) Batch Gradient Descent:

In batch gradient descent, the gradient of the loss function is computed with
respect to the weights for the entire training dataset, and the weights are
updated after each iteration. This provides a more accurate estimate of the
gradient, but it can be computationally expensive for large datasets.

2) Stochastic Gradient Descent (SGD):

In SGD, the gradient of the loss function is computed with respect to a single
training example, and the weights are updated after each example. SGD has
a lower computational cost per iteration compared to batch gradient descent,
but it can be less stable and may not converge to the optimal solution.

3) Mini-Batch Gradient Descent:

Mini-batch gradient descent is a compromise between batch gradient

descent and SGD. The gradient of the loss function is computed with respect
to a small randomly selected subset of the training examples (called a mini-
batch), and the weights are updated after each mini-batch. Mini-batch
gradient descent provides a balance between the stability of batch gradient
descent and the computational efficiency of SGD.

4) Momentum:

Momentum is a variant of gradient descent that incorporates information from

the previous weight updates to help the algorithm converge more quickly to
the optimal solution. Momentum adds a term to the weight update that is
proportional to the running average of the past gradients, allowing the
algorithm to move more quickly in the direction of the optimal solution.
Gradient Descent
Gradient Descent Algorithm iteratively calculates the next point using
gradient at the current position, scales it (by a learning rate) and subtracts
obtained value from the current position (makes a step). It subtracts the
value because we want to minimise the function (to maximise it would be
adding). This process can be written as:

There’s an important parameter η which scales the gradient and thus

controls the step size. In machine learning, it is called learning rate and have
a strong influence on performance.

 The smaller learning rate the longer GD converges, or may reach

maximum iteration before reaching the optimum point

 If learning rate is too big the algorithm may not converge to the optimal
point (jump around) or even to diverge completely.

In summary, Gradient Descent method’s steps are:

1. choose a starting point (initialisation)

2. calculate gradient at this point

3. make a scaled step in the opposite direction to the gradient (objective:

minimise)

4. repeat points 2 and 3 until one of the criteria is met:

 maximum number of iterations reached

 step size is smaller than the tolerance (due to scaling or a small

gradient).
This function takes 5 parameters:

1. starting point - in our case, we define it manually but in

practice, it is often a random initialisation

2. gradient function - has to be specified before-hand

3. learning rate - scaling factor for step sizes

4. maximum number of iterations

5. tolerance to conditionally stop the algorithm (in this case a default

value is 0.01)

Example 1 — a quadratic function

Let’s take a simple quadratic function defined as:

Because it is an univariate function a gradient function is:

For this function, by taking a learning rate of 0.1 and starting point
at x=9 we can easily calculate each step by hand. Let’s do it for the
first 3 steps:
The animation below shows steps taken by the GD algorithm for
learning rates of 0.1 and 0.8. As you see, for the smaller learning
rate, as the algorithm approaches the minimum the steps are getting
gradually smaller. For a bigger learning rate, it is jumping from one
side to another before converging.

First 10 steps taken by GD for small and big learning rate; Image by author

Trajectories, number of iterations and the final converged result

(within tolerance) for various learning rates are shown below:

https://www.geeksforgeeks.org/gradient-descent-algorithm-and-its-variants/

Gradient Descent
No ratings yet
Gradient Descent
17 pages
Artificial Intelligence
100% (1)
Artificial Intelligence
110 pages
Built Grand Four Belt Rover100
67% (3)
Built Grand Four Belt Rover100
93 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Gradient_Descent_(1)
No ratings yet
Gradient_Descent_(1)
8 pages
DL Unit -2
No ratings yet
DL Unit -2
20 pages
Gradient Descent
No ratings yet
Gradient Descent
1 page
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
Unit 4 Final
No ratings yet
Unit 4 Final
29 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
GD Types
No ratings yet
GD Types
98 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
UNIT3
No ratings yet
UNIT3
37 pages
2. Gradient Descent (GD)- GD With Momentum- Nesterov Accelerated GD- Stochastic GD - OrIGINAL
No ratings yet
2. Gradient Descent (GD)- GD With Momentum- Nesterov Accelerated GD- Stochastic GD - OrIGINAL
25 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Optimizers and Activation functions in Deep Learning
No ratings yet
Optimizers and Activation functions in Deep Learning
15 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Paper 2
No ratings yet
Paper 2
27 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Machine Vesion hw6
No ratings yet
Machine Vesion hw6
18 pages
UNIT III Part-2
No ratings yet
UNIT III Part-2
39 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
Gradient Descent Unit3
No ratings yet
Gradient Descent Unit3
9 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Code Adam Optimization Algorithm From Scratch
No ratings yet
Code Adam Optimization Algorithm From Scratch
28 pages
DL UNIT-I
No ratings yet
DL UNIT-I
30 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
Unit 2
No ratings yet
Unit 2
13 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
4 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
Gradient Descent 5 Part 2
No ratings yet
Gradient Descent 5 Part 2
15 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
Document 2
No ratings yet
Document 2
30 pages
ML MODULE 5 FULL NOTES
No ratings yet
ML MODULE 5 FULL NOTES
23 pages
Optimization
No ratings yet
Optimization
3 pages
dl 3unit last topic meta algoritham
No ratings yet
dl 3unit last topic meta algoritham
32 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Gradient Descent and Its Types
No ratings yet
Gradient Descent and Its Types
5 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
DL Class1
No ratings yet
DL Class1
18 pages
CV Lec4
No ratings yet
CV Lec4
46 pages
Advantages Bpa
No ratings yet
Advantages Bpa
38 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
Gradient Descent - A Quick, Simple Introduction - Built in
No ratings yet
Gradient Descent - A Quick, Simple Introduction - Built in
15 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
4 pages
SGD
No ratings yet
SGD
3 pages
UNIT2
No ratings yet
UNIT2
25 pages
DL UNIT 2
No ratings yet
DL UNIT 2
46 pages
2marks ML
No ratings yet
2marks ML
3 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
AI33
No ratings yet
AI33
6 pages
5 Optimizers
No ratings yet
5 Optimizers
10 pages
Deep Learning-Summery
No ratings yet
Deep Learning-Summery
24 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
AI-Powered Supply Chain Optimisation and The Integration Within The Fashion Industry: A Literature Review
No ratings yet
AI-Powered Supply Chain Optimisation and The Integration Within The Fashion Industry: A Literature Review
6 pages
Information Architecture and Web Usability: Course Notes
No ratings yet
Information Architecture and Web Usability: Course Notes
175 pages
Hybrid CoE Paper 14 AI Based Technologies WEB
No ratings yet
Hybrid CoE Paper 14 AI Based Technologies WEB
20 pages
Research Paper by Yeswanth
No ratings yet
Research Paper by Yeswanth
5 pages
AI Governance Policy POLICY_93_2024
No ratings yet
AI Governance Policy POLICY_93_2024
35 pages
CRISP DM Business Understanding Completed
No ratings yet
CRISP DM Business Understanding Completed
18 pages
Aidoge Whitepaper
No ratings yet
Aidoge Whitepaper
9 pages
Lesson 3.2 - Supervised Learning Evaluation PDF
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation PDF
38 pages
CVPR 2024 Call For Papers
No ratings yet
CVPR 2024 Call For Papers
4 pages
BI unit 5-1
No ratings yet
BI unit 5-1
25 pages
Generative AI at SAP
No ratings yet
Generative AI at SAP
36 pages
Schmohl Watanabe Froehlich Herzberg 2020
No ratings yet
Schmohl Watanabe Froehlich Herzberg 2020
4 pages
Accounting for the Development of Gen AI Software
No ratings yet
Accounting for the Development of Gen AI Software
10 pages
preboard QP AI CBSE
No ratings yet
preboard QP AI CBSE
6 pages
TPOT Automated Machine Learning in Python: 607K Followers Editors' Picks Features Deep Dives Grow Contribute About
No ratings yet
TPOT Automated Machine Learning in Python: 607K Followers Editors' Picks Features Deep Dives Grow Contribute About
43 pages
The PPM Benchmarking Report 2019: An AXELOS Publication
No ratings yet
The PPM Benchmarking Report 2019: An AXELOS Publication
28 pages
1 - Introduction To Emerging Technologies PDF
No ratings yet
1 - Introduction To Emerging Technologies PDF
77 pages
WBC3 Editor
No ratings yet
WBC3 Editor
20 pages
Deep Learning
100% (1)
Deep Learning
3 pages
Underwater Mine & Rock Prediction by Evaluation of Machine Learning Algorithms
No ratings yet
Underwater Mine & Rock Prediction by Evaluation of Machine Learning Algorithms
13 pages
Digital_finance_and_future_of_banks_and
No ratings yet
Digital_finance_and_future_of_banks_and
14 pages
Mini Project
No ratings yet
Mini Project
10 pages
Denoising Diffusion Probabilistic Models
No ratings yet
Denoising Diffusion Probabilistic Models
25 pages
Revised A Level R5 PDF
No ratings yet
Revised A Level R5 PDF
307 pages
Ai Concept Paper
No ratings yet
Ai Concept Paper
3 pages
Anirudh Shankar - CV
No ratings yet
Anirudh Shankar - CV
2 pages
A Beginner's Guide To Large Language Models
No ratings yet
A Beginner's Guide To Large Language Models
10 pages
THE-NATIONAL-GUIDELINES-ON-AI-GOVERNANCE-ETHICS
No ratings yet
THE-NATIONAL-GUIDELINES-ON-AI-GOVERNANCE-ETHICS
140 pages