0% found this document useful (0 votes)
10 views

5 Optimizers

Uploaded by

divyansh.roorkee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

5 Optimizers

Uploaded by

divyansh.roorkee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

ML Model Building Process Flow

Optimization Algorithms

• In machine learning, optimization algorithms are computational methods used to


find the best set of parameters for a model that minimizes (or maximizes) a certain
objective function.
• These algorithms play a crucial role in training machine learning models.
• The primary goal of optimization algorithms is to iteratively adjust model
parameters to optimize a specified objective, typically a loss or cost function, to
make the model perform well on a given task.
• The choice of optimization algorithm depends on factors such as the nature of the
problem, the characteristics of the data, the size of the model, and the
computational resources available.
What is Gradient

• In the context of machine learning and optimization, the gradient is a vector


representing a function’s slope or rate of change with respect to its input variables.
• Gradient tells us how the function's output changes as we make small changes to
each input variable.
Gradient Descent Algorithm

• Gradient descent is an optimization algorithm used to find the minimum (or


maximum) of a function iteratively.
• In the context of machine learning, gradient descent is commonly employed to
update the model's parameters (weights and biases) during the training process to
minimize the loss function and make the model more accurate in making
predictions.
• In gradient descent, the gradient of the loss function with respect to the model's
parameters (weights and biases) is computed.
• This gradient tells us about the direction in which the parameters should be
adjusted to minimize the loss function, i.e., move towards the minimum.
• During the training of a model, the gradients of the loss function with respect to the
model's parameters are computed. The gradients are then used to update the
model's parameters through optimization algorithms.
Gradient Descent Algorithm

• The steps of the gradient descent algorithm are as follows:


1. Initialize Parameters: Start by initializing the model's parameters randomly or with some
predefined values.
2. Compute Loss: Evaluate the loss function using the current set of parameters and the training
data. The loss function measures the difference between the predicted outputs and the actual
target values.
3. Compute Gradients: Calculate the gradients of the loss function with respect to each
parameter. These gradients tell us how much the loss will change for a small change in each
parameter.
4. Update Parameters: Update the model's parameters by subtracting a small fraction of the
gradients from the current parameter values. This fraction is called the learning rate (often
denoted as α). The learning rate determines the step size of the updates and should be
carefully chosen to avoid overshooting the minimum or getting stuck in local minima.
5. Repeat: Repeat steps 2 to 4 for a certain number of iterations or until the loss converges to a
satisfactory value.
Epochs & Iteration

• An epoch refers to a single pass of the entire dataset through the model during the
training phase. In other words, during one epoch, the algorithm processes the entire
dataset exactly once.
• Iteration refers to the number of batches that are processed during each epoch. The
number of iterations required to complete an epoch depends on the batch size
chosen.
• E.g., if you have a dataset of 1000 rows, and you chose a batch size of 100, then
each iteration would process 100 rows, and it would take 10 iterations to complete
an epoch.
Gradient Descent Algorithm

• There are different variants of gradient descent, such as:


• Batch Gradient Descent:
• In Batch Gradient Descent entire training data set will be used.
• As in this we take the whole data set at a time, it requires more resources and is computationally
very expensive.
• Conversion will happen quickly in this case.
• One epoch will have 1 iteration as the batch contains the complete training set.
Gradient Descent Algorithm

• There are different variants of gradient descent, such as:


• Stochastic Gradient Descent (SGD):
• In SGD, a single data sample is randomly chosen for each iteration to compute the gradients and
update the parameters.
• It is computationally more efficient than batch gradient descent but can have more noise in the
updates.
• Conversion will not happen as quickly as it happens in batch gradient descent.
• One epoch will have iteration=no of records in the training dataset
Gradient Descent Algorithm

• There are different variants of gradient descent, such as:


• Mini-batch Gradient Descent:
• This is a compromise between batch gradient descent and SGD. It uses a small batch of data samples
to compute the gradients and update the parameters.
• Weight update will take more time relative to batch gradient descent so conversion will not happen
as quickly as it happens in batch gradient descent.
• One epoch will have iteration =Total no of rows in the training set/batch size
Gradient Descent Algorithm

• There are different variants of gradient descent, such as:


• Mini-batch Gradient Descent with Momentum:
• It is an extension of Mini-batch Gradient Descent incorporating a momentum term.
• Momentum introduces a "velocity" term to the updates, helping to accelerate convergence and
navigate areas with low gradients.
• It is often more computationally efficient than mini-batch gradient descent and can lead to faster
convergence and better performance.

You might also like