0% found this document useful (0 votes)

10 views

Gradient_decent

Uploaded by

akter12345b

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Gradient_decent

Uploaded by

akter12345b

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Stochastic Gradient Descent, Hyper-parameter Tuning: Optimizing

Learning Efficiency

Course: Machine Learning in Healthcare

Orientador: MD. MAHFUZ AHMED
Session: 2022-23 (M.Sc)

Department of Biomedical Engineering

Islamic University

November 22, 2024

Islam, M. K. (BME) Tı́tle Page November 22, 2024 1 / 15
Outline

1. Gradient Descent

2. Stochastic Gradient Descent

3. Hyper-parameter Tuning Techniques

Islam, M. K. (BME) Tı́tle Page November 22, 2024 2 / 15

Gradient Descent

Introduction to Gradient Descent

What is Gradient Descent?
Gradient Descent is a first-order iterative optimization algorithm used to minimize a
cost/loss function by iteratively moving towards the steepest descent as defined by
the negative of the gradient. It is widely used in machine learning for adjusting model
parameters.

Figure 1 – Gradient Descent Optimization Example.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 3 / 15

Gradient Descent

Steps in Gradient Descent

Steps:
Objective Function: Define the function that needs to be minimized. This function
represents the error or discrepancy between predicted and actual values in a machine
learning model.
Gradient Calculation: Compute the gradient of the objective function. The gradient
indicates the direction of the steepest ascent, which is used to guide the model
parameters in the opposite direction to approach the minimum.
Learning Rate: Select an appropriate learning rate (α), a hyperparameter that
determines the size of steps taken during the optimization process. The learning
rate significantly influences how quickly the algorithm converges to the minimum.
Update Rule: Apply the update rule to iteratively adjust the model parameters in the
direction opposite to the gradient. This step is repeated until the algorithm reaches
the minimum of the objective function.
Islam, M. K. (BME) Tı́tle Page November 22, 2024 4 / 15
Gradient Descent

Cost/Loss Functions
Commonly Used Loss Functions:
Mean Squared Error (MSE) / Mean Absolute Error (MAE): Primarily used in
regression tasks to measure the average squared or absolute differences between the
predicted and actual values.
Cross-Entropy Loss (Log Loss): Applied in binary and multi-class classification
tasks to evaluate the accuracy of predicted probabilities.
Hinge Loss: Frequently used for binary classification, especially with Support Vector
Machines (SVMs), to measure the error between predicted and actual class labels.
Huber Loss: A robust alternative to MSE, used in regression tasks to reduce
sensitivity to outliers by combining characteristics of MSE and MAE.
Softmax Cross-Entropy Loss: An extension of Cross-Entropy Loss for multi-class
classification tasks, commonly applied to models that output probability distributions
across multiple classes.
Islam, M. K. (BME) Tı́tle Page November 22, 2024 5 / 15
Gradient Descent

Gradient Descent Update Rule

General formula for updating parameters:

The Gradient Descent algorithm updates model parameters iteratively to minimize the
objective function.
The general formula for updating parameters is:
xnext = x − α∇f (x)
Where:
xnext : The updated parameter value.
x : The current parameter value.
α : The learning rate.
∇f (x) : The gradient of the objective function at x.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 6 / 15

Gradient Descent

Learning Rate (α)

The learning rate is a crucial hyperparameter in Gradient Descent, determining the step
size during each iteration of the optimization process.
Impact of Learning Rate:
If Too Large: A high learning rate can cause the algorithm to overshoot the
minimum, leading to divergence or oscillation around the minimum without achieving
convergence.
If Too Small: A low learning rate can result in slow convergence to the minimum,
increasing the training time and risking stagnation in local minima.

Typical Range:
The learning rate typically falls within the range of 0.00001 to 0.1. Hyperparameter
tuning is essential to identify the optimal learning rate for specific training tasks.
Islam, M. K. (BME) Tı́tle Page November 22, 2024 7 / 15
Gradient Descent

Types of Gradient Descent

Three Main Types of Gradient Descent includes:

Batch Gradient Descent: Uses the entire dataset to compute the gradient.
Stochastic Gradient Descent (SGD): Uses one data point per iteration, making it
faster but noisier.
Mini-batch Gradient Descent: A middle ground that uses a small batch of data for
each iteration.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 8 / 15

Stochastic Gradient Descent

Definition
Stochastic Gradient Descent (SGD) is an optimization algorithm used to minimize
the objective function, typically in machine learning models, by iteratively adjusting
the model parameters (like weights) to reduce the error. It is a variant of the more
general Gradient Descent method but differs in how it updates the parameters.
In standard Gradient Descent, the algorithm calculates the gradient of the error for
the entire dataset at each iteration, which can be computationally expensive for large
datasets. SGD, on the other hand, updates the model parameters based on the
gradient of the error for one training sample (or a small batch) at a time. This makes
it faster and more efficient for large datasets but can introduce more noise into the
optimization process.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 9 / 15

Stochastic Gradient Descent

SGD Cont.

Advantages of SGD:
Faster computation for large datasets.
Allows the model to start learning immediately rather than waiting for the entire
dataset to be processed.
More frequent updates.

Disadvantages of SGD:
Due to the randomness, it can lead to fluctuations in the learning process and may
not converge as smoothly as batch Gradient Descent.
Requires careful tuning of learning rates and often needs more iterations to stabilize.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 10 / 15

Stochastic Gradient Descent

Comparison between GD and SGD

Stochastic Gradient
Aspect Gradient Descent (GD)
Descent (SGD)
Entire Dataset: ∇J(θ) =
Gradient Computation 1 PN Single Sample: ∇J(θ; xi , yi )
N i=1 ∇J(θ; xi , yi )

After processing the entire

Update Frequency After processing each sample.
dataset (batch).
Computation Cost High (for large datasets). Low (faster updates).
Noisy convergence (fluctuates
Convergence Stability Smooth convergence.
around the minimum).
Requires the entire dataset in Processes one sample at a
Memory Requirements
memory. time.
Efficiency Efficient for small datasets. Efficient for large datasets.
Islam, M. K. (BME) Tı́tle Page November 22, 2024 11 / 15
Stochastic Gradient Descent

Hyper-parameters

What are hyper-parameters in machine learning?

Hyper-parameters are configuration variables set before the training process of a
machine learning model that governs how the model is trained. Unlike model
parameters (such as weights and biases), which are learned during training, hyper-
parameters are predefined and control aspects like the learning behaviour, model
structure, and the optimization process. They play a critical role in determining
the performance and efficiency of a model.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 12 / 15

Stochastic Gradient Descent

Types of Hyper-parameters

1. Model-related Hyper-parameters:
These define the architecture of the model or control regularization techniques:
Number of layers in a neural network.
Number of neurons per layer.
Kernel size in a convolutional neural network.
Dropout rate to prevent overfitting.

2. Regularization Hyperparameters:
These influence the optimization process and how the model is trained:
L2 regularization (weight decay): Prevents large weights in the model.
Dropout rate: Randomly drops neurons during training to prevent overfitting.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 13 / 15

Stochastic Gradient Descent

Types of Hyper-parameters cont.

3. Optimization-Related Hyperparameters:
These influence the optimization process and how the model is trained:
Learning rate: Controls the step size in each iteration during optimization.
Batch size: Number of samples processed before the model is updated.
Number of epochs: The number of times the entire dataset passes through the model.
Momentum: Helps accelerate gradient vectors in the right direction to converge faster.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 14 / 15

Hyper-parameter Tuning Techniques

How do we tune hyper-parameters effectively?

SGD is widely used in-
Grid Search: Exhaustively searches a predefined range of hyper-parameters.
Random Search: Samples hyper-parameters from a distribution.
Random Bayesian Optimization: Models the objective function to make informed
guesses.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 15 / 15

Graphics For Urban Design
No ratings yet
Graphics For Urban Design
164 pages
Program Overview: p1 p2 p3 p4 p5
No ratings yet
Program Overview: p1 p2 p3 p4 p5
60 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
Project Report On Android
100% (1)
Project Report On Android
83 pages
Wired UK - December 2017
100% (2)
Wired UK - December 2017
228 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
Unit 2
No ratings yet
Unit 2
13 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
SGD
No ratings yet
SGD
3 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
05.Stochastic Gradient Descent (3)
No ratings yet
05.Stochastic Gradient Descent (3)
2 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
Gradient Descent and Cost Function
No ratings yet
Gradient Descent and Cost Function
14 pages
PCA and Convex optimization and bias , Variance-2
No ratings yet
PCA and Convex optimization and bias , Variance-2
29 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
23 pages
UNIT2
No ratings yet
UNIT2
25 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
Paper 2
No ratings yet
Paper 2
27 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
Lecture 08 ML
No ratings yet
Lecture 08 ML
20 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
9 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
UNIT3
No ratings yet
UNIT3
37 pages
5 Optimizers
No ratings yet
5 Optimizers
10 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Technical_writing (2)
No ratings yet
Technical_writing (2)
9 pages
GD Types
No ratings yet
GD Types
98 pages
Gradient Descent 5 Part 2
No ratings yet
Gradient Descent 5 Part 2
15 pages
DL Unit -2
No ratings yet
DL Unit -2
20 pages
Technical_writing (1)
No ratings yet
Technical_writing (1)
9 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
04 Batch SGD Mini Batch Gradient Descent Algorithms
No ratings yet
04 Batch SGD Mini Batch Gradient Descent Algorithms
3 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
4_Gradient Descent and Stochastic GD
No ratings yet
4_Gradient Descent and Stochastic GD
37 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
Technical_writing
No ratings yet
Technical_writing
8 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
4 pages
ANN Explanation Request Updated
No ratings yet
ANN Explanation Request Updated
44 pages
S09_DNN_Gradients_wip
No ratings yet
S09_DNN_Gradients_wip
28 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
optimization techniques (SGD alternatives)
No ratings yet
optimization techniques (SGD alternatives)
34 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
4 pages
Document 2
No ratings yet
Document 2
30 pages
Optimizers
No ratings yet
Optimizers
4 pages
Gradient Descent and Its Types
No ratings yet
Gradient Descent and Its Types
5 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
2. Gradient Descent (GD)- GD With Momentum- Nesterov Accelerated GD- Stochastic GD - OrIGINAL
No ratings yet
2. Gradient Descent (GD)- GD With Momentum- Nesterov Accelerated GD- Stochastic GD - OrIGINAL
25 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
SVM-Mahfuz
No ratings yet
SVM-Mahfuz
27 pages
Decision Tree by Masud (1)
No ratings yet
Decision Tree by Masud (1)
12 pages
Federated Learning
No ratings yet
Federated Learning
21 pages
LDA 1 (1)
No ratings yet
LDA 1 (1)
12 pages
Chapter-3
No ratings yet
Chapter-3
13 pages
Logistic regression by Nirzona
No ratings yet
Logistic regression by Nirzona
11 pages
biosensor (1)
No ratings yet
biosensor (1)
28 pages
T1 Basis of Laboratory Safety
No ratings yet
T1 Basis of Laboratory Safety
54 pages
The study of classical physiology and of many medical specialties is structured
No ratings yet
The study of classical physiology and of many medical specialties is structured
11 pages
v 6
No ratings yet
v 6
22 pages
SLIDE
No ratings yet
SLIDE
23 pages
Casio Watch 3416 Manual
No ratings yet
Casio Watch 3416 Manual
3 pages
The Blockchain Revolution
No ratings yet
The Blockchain Revolution
27 pages
Project Charter Sample
No ratings yet
Project Charter Sample
18 pages
Simplification and Approximation PDF
No ratings yet
Simplification and Approximation PDF
8 pages
FactoryTalk Logix Echo Single Node V3.00 - Failed To Create Controller Error or Modify Controller Failed Error
No ratings yet
FactoryTalk Logix Echo Single Node V3.00 - Failed To Create Controller Error or Modify Controller Failed Error
4 pages
PDDL
No ratings yet
PDDL
2 pages
ROJA45 Recommend Products USER GUIDE
No ratings yet
ROJA45 Recommend Products USER GUIDE
16 pages
Spark With Bigdata
No ratings yet
Spark With Bigdata
94 pages
Mobile Phone Charger References
No ratings yet
Mobile Phone Charger References
13 pages
Guidelines For Submission of MCA, Major Project: The Guide
No ratings yet
Guidelines For Submission of MCA, Major Project: The Guide
5 pages
Unit 2 Components of A Computer System
No ratings yet
Unit 2 Components of A Computer System
12 pages
CrowdFunding Thesis Presentation
0% (1)
CrowdFunding Thesis Presentation
29 pages
Ma321 22 Prac4
No ratings yet
Ma321 22 Prac4
4 pages
Adzievski, Kuzman Siddiqi, A. H - Introduction To
No ratings yet
Adzievski, Kuzman Siddiqi, A. H - Introduction To
645 pages
Home Automation 2024 Guide
No ratings yet
Home Automation 2024 Guide
8 pages
Basic Excel Shortcut Keys
No ratings yet
Basic Excel Shortcut Keys
5 pages
Nokia N8 Data Sheet
No ratings yet
Nokia N8 Data Sheet
1 page
Introduction To Impulse Hammers
No ratings yet
Introduction To Impulse Hammers
3 pages
NuMicro Family MS51 Series Technical Reference Manual
No ratings yet
NuMicro Family MS51 Series Technical Reference Manual
316 pages
NI Tutorial 11039 en
No ratings yet
NI Tutorial 11039 en
5 pages
Jay You Yum Yum
No ratings yet
Jay You Yum Yum
1 page
Microsoft Word - BCA - Prospectus - 2023 - Old+NEP - 11
No ratings yet
Microsoft Word - BCA - Prospectus - 2023 - Old+NEP - 11
2 pages
Ylsk Series
No ratings yet
Ylsk Series
171 pages
A FPGA Implementation of Model Predictive Control : K.V. Ling, S.P. Yue and J.M. Maciejowski
No ratings yet
A FPGA Implementation of Model Predictive Control : K.V. Ling, S.P. Yue and J.M. Maciejowski
6 pages
Liebert CRV 11 50 KW Brochure English
No ratings yet
Liebert CRV 11 50 KW Brochure English
12 pages
Full Timeline
No ratings yet
Full Timeline
3 pages

Gradient_decent

Uploaded by

Gradient_decent

Uploaded by

Stochastic Gradient Descent, Hyper-parameter Tuning: Optimizing

Course: Machine Learning in Healthcare

Department of Biomedical Engineering

November 22, 2024

2. Stochastic Gradient Descent

3. Hyper-parameter Tuning Techniques

Islam, M. K. (BME) Tı́tle Page November 22, 2024 2 / 15

Introduction to Gradient Descent

Figure 1 – Gradient Descent Optimization Example.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 3 / 15

Steps in Gradient Descent

Gradient Descent Update Rule

General formula for updating parameters:

Islam, M. K. (BME) Tı́tle Page November 22, 2024 6 / 15

Learning Rate (α)

Types of Gradient Descent

Three Main Types of Gradient Descent includes:

Islam, M. K. (BME) Tı́tle Page November 22, 2024 8 / 15

Stochastic Gradient Descent

Islam, M. K. (BME) Tı́tle Page November 22, 2024 9 / 15

Islam, M. K. (BME) Tı́tle Page November 22, 2024 10 / 15

Comparison between GD and SGD

After processing the entire

What are hyper-parameters in machine learning?

Islam, M. K. (BME) Tı́tle Page November 22, 2024 12 / 15

Islam, M. K. (BME) Tı́tle Page November 22, 2024 13 / 15

Types of Hyper-parameters cont.

Islam, M. K. (BME) Tı́tle Page November 22, 2024 14 / 15

Hyper-parameter Tuning Techniques

How do we tune hyper-parameters effectively?

Islam, M. K. (BME) Tı́tle Page November 22, 2024 15 / 15

You might also like