0% found this document useful (0 votes)
0 views

Lecture 1.1 Gradient Descent Algorithm

The document discusses closed-form equations and various types of gradient descent (Batch, Stochastic, Mini-batch) used in optimization for machine learning. It explains the definitions, properties, advantages, and disadvantages of each gradient descent type, along with mathematical formulations and numerical examples. The content is presented by Dr. Mainak Biswas and emphasizes the importance of these concepts in minimizing loss functions.

Uploaded by

Biswas Lectures
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Lecture 1.1 Gradient Descent Algorithm

The document discusses closed-form equations and various types of gradient descent (Batch, Stochastic, Mini-batch) used in optimization for machine learning. It explains the definitions, properties, advantages, and disadvantages of each gradient descent type, along with mathematical formulations and numerical examples. The content is presented by Dr. Mainak Biswas and emphasizes the importance of these concepts in minimizing loss functions.

Uploaded by

Biswas Lectures
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Lecture 1.

Closed-form Equation, Type of Gradient


Descent
(Batch, Stochastic, Mini-batch) -
Definition, properties.

Dr. Mainak Biswas


Closed-form Equation
• Closed-form Equation: closed-form equation is
a mathematical expression that provides a
direct way to compute a value without
requiring iterative procedures or infinite series
– Example:
• Sum of an Arithmetic Series: The sum of the first n
terms of an arithmetic series with the first term a and
common difference d is:
𝑛
𝑆𝑛 = (2𝑎 + 𝑛 − 1 𝑑
2

Dr. Mainak Biswas


Gradient Descent
• Gradient Descent is an optimization algorithm used in
machine learning and deep learning to minimize the
loss function by updating the model's parameters in
the direction of the steepest descent
• The type of gradient descent depends on how much
data is used to compute the gradient at each iteration
• Gradient descent is also called “the deepest downward
slope algorithm”
• It is very important in machine learning, where it is
used to minimize a cost function

Dr. Mainak Biswas


Dr. Mainak Biswas
Loss function
𝑁
1 2
𝐸 𝑤 = 𝑓 𝑥𝑖 − 𝑦𝑖
2𝑁
𝑖=1
• Where 𝑓 𝑥𝑖 = 𝑤 𝑇 𝑥𝑖 , then
𝑁
𝜕𝐸 1
= 𝑓 𝑥𝑖 − 𝑦𝑖 𝑥𝑖
𝜕𝑤 𝑁
𝑖=1

Dr. Mainak Biswas


Mathematical Formulation of Gradient
Descent

𝑤 = 𝑤 − 𝜂𝛻𝐸(𝑤)
• 𝑤 : Model parameters (weights)
• 𝜂 : Learning rate
• 𝛻𝐸(𝑤): Gradient of the loss function 𝐸(𝑤) with
respect to 𝑤
• It can be also written as:
𝑁
1
𝑤 =𝑤−𝜂 𝑓 𝑥𝑖 − 𝑦𝑖 𝑥𝑖
𝑁
𝑖=1

Dr. Mainak Biswas


Dr. Mainak Biswas
Numerical Problem
• Let 𝐸 𝑤 = 𝑤 − 3 2 + 2, 𝜂 = 0.1, 𝑤 = 0 ,
then find w and E(w) for five iterations:
• So, we see 𝑥𝑖 = 1, therefore iterations can be
solved in terms of w only
𝛿𝐸
• =2 𝑤−3
𝛿𝑤
• 𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 − 𝜂𝛻𝐸 𝑤𝑜𝑙𝑑 ⇒ 0.8𝑤𝑜𝑙𝑑 + 0.6

Sl w 𝐸 𝑤
1 0 11
2 0.6 7.76

Dr. Mainak Biswas


Sl w 𝐸 𝑤
1 0 11
2 0.6 7.76
2 1.0800 5.6864
2 1.4640 4.3593
2 1.7712 3.5099

Dr. Mainak Biswas


Batch Gradient Descent
• Batch Gradient Descent is an optimization algorithm
used to minimize a loss function by iteratively updating
the model's parameters using the entire dataset to
calculate the gradient
• Advantages:
– Computes the gradient with high precision using the entire
dataset
– Converges steadily towards the minimum
– Suitable for smooth and convex loss functions
• Disadvantages:
– Memory-intensive when the dataset is large
– Requires processing the entire dataset for each iteration

Dr. Mainak Biswas


Stochastic Gradient Descent
• Stochastic Gradient Descent (SGD) is a variant of gradient
descent where the model parameters are updated using
only a single training example at a time, rather than the
entire dataset
• This leads to faster updates and can help the algorithm
escape local minima, making it suitable for large datasets
• Advantages
– Faster Updates
– Escaping Local Minima
– Scalability
• Disadvantages
– Noisy Convergence
– Requires More Iterations

Dr. Mainak Biswas


Mini-Batch Gradient Descent
• Mini-batch Gradient Descent is a hybrid approach between Batch
Gradient Descent and Stochastic Gradient Descent. It aims to combine the
advantages of both by updating the model parameters using a subset
(mini-batch) of the training data rather than the entire dataset (batch) or
just one data point (stochastic)
– Mini-batch: The dataset is divided into small batches, each containing a fixed
number of training examples (The size of each mini-batch (denoted as 𝑏) is a
hyper-parameter)
– Gradient Calculation: For each mini-batch, the gradient is calculated based on
the average of the training examples in that batch
– Weight Update: The model parameters are updated using the computed
gradient for the mini-batch
– Repeat for all mini-batches until convergence
• Advantages: Faster than Batch GD, Less Noisy than SGD
• Disadvantages: Choosing the Right Batch Size, Memory Considerations

Dr. Mainak Biswas

You might also like