Gradient Descent 5 Part 2

Uploaded by

trendysyncs

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Gradient Descent 5 Part 2

Uploaded by

trendysyncs

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

TRAINING NEURAL

NETWORKS
Key Points
Cost function:
It is a function that measures the performance of a model for any given data. Cost Function quantifies the

error between predicted values and expected values and presents it in the form of a single real number.

After making a hypothesis with initial parameters, we calculate the Cost function. And with a goal to reduce

the cost function, we modify the parameters by using the Gradient descent algorithm over the given data.
Steps of Gradient Descent
● The algorithm starts with an initial set of parameters and updates them in small steps to minimize the cost
function.
● In each iteration of the algorithm, the gradient of the cost function with respect to each parameter is computed.
● The gradient tells us the direction of the steepest ascent, and by moving in the opposite direction, we can find
the direction of the steepest descent.
● The size of the step is controlled by the learning rate, which determines how quickly the algorithm moves
towards the minimum.
● The process is repeated until the cost function converges to a minimum, indicating that the model has reached
the optimal set of parameters.
x y
Example
1 2

2 4

3 6

4 8
x y w y-pred MSE Derivative
(y-y-pred)2 of MSE

1 2

2 4

3 6

4 8
x y w y-pred MSE Derivative
(y-y-pred)2 of MSE

1 2

2 4

3 6

4 8
Online Learning
Entire sample is not given to network, but we are given instances one by one and would like the
network to update its parameters after each instance, adapting itself slowly in time.

Advantages

1. It saves us the cost of storing the training sample in an external memory and storing the
intermediate results during optimization.

2. The problem may be changing in time, which means that the sample distribution is not fixed,
and a training set cannot be chosen a priori. For example, we may be implementing a speech
recognition system that adapts itself to its user.

3. There may be physical changes in the system. For example, in a robotic system, the
components of the system may wear out, or sensors may degrade.
Stochastic Gradient Descent
Ways of Updating Weights

Input Actual Output Predicted Output Update in weights

Key points
● The process of incrementally updating the weights is also called “stochastic” gradient descent
since it approximates the minimization of the cost function.
● Although the stochastic gradient descent approach might sound inferior to gradient descent
due its “stochastic” nature and the “approximated” direction (gradient), it can have certain
advantages in practice.
● Often, stochastic gradient descent converges much faster than gradient descent since the
updates are applied immediately after each training sample; stochastic gradient descent is
computationally more efficient, especially for very large datasets.
Advantages:
Speed: SGD is faster than other variants of Gradient Descent such as Batch Gradient Descent and Mini-Batch
Gradient Descent since it uses only one example to update the parameters.
Memory Efficiency: Since SGD updates the parameters for each training example one at a time, it is
memory-efficient and can handle large datasets that cannot fit into memory.
Avoidance of Local Minima: Due to the noisy updates in SGD, it has the ability to escape from local minima and
converge to a global minimum.
Disadvantages:
Noisy updates: The updates in SGD are noisy and have a high variance, which can make the optimization process
less stable and lead to oscillations around the minimum.
Slow Convergence: SGD may require more iterations to converge to the minimum since it updates the
parameters for each training example one at a time.
Sensitivity to Learning Rate: The choice of learning rate can be critical in SGD since using a high learning rate
can cause the algorithm to overshoot the minimum, while a low learning rate can make the algorithm converge
slowly.

Case Problem 5 Duke Energy Coal Allocation
No ratings yet
Case Problem 5 Duke Energy Coal Allocation
4 pages
Jackson H A.fundamentals of Subm.1992.TRANS
100% (2)
Jackson H A.fundamentals of Subm.1992.TRANS
30 pages
Example CDCT Appendix 25
No ratings yet
Example CDCT Appendix 25
53 pages
(Oliver Knill) Linear Algebra and Vector Calculus I
No ratings yet
(Oliver Knill) Linear Algebra and Vector Calculus I
201 pages
Hundred Thousand. That Number Has A New Name: One Lakh.: Grade 4
100% (2)
Hundred Thousand. That Number Has A New Name: One Lakh.: Grade 4
113 pages
GD Types
No ratings yet
GD Types
98 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
Optimizers and Activation functions in Deep Learning
No ratings yet
Optimizers and Activation functions in Deep Learning
15 pages
lec7-8+CNN-2
No ratings yet
lec7-8+CNN-2
69 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
No ratings yet
12-Mini-Batch Gradient Descent - Exponential Weighted Averages-07-08-2024
2 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
Gradient Descent Algorithm is a first
No ratings yet
Gradient Descent Algorithm is a first
5 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
23 pages
Gradient Descent and Its Types
No ratings yet
Gradient Descent and Its Types
5 pages
Unit 4 Final
No ratings yet
Unit 4 Final
29 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
DL 4
No ratings yet
DL 4
15 pages
UNIT III Part-2
No ratings yet
UNIT III Part-2
39 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
MODULE 2 DL
No ratings yet
MODULE 2 DL
9 pages
unit-2
No ratings yet
unit-2
16 pages
cst414-deep learning module 2
No ratings yet
cst414-deep learning module 2
13 pages
chp2 Gradient Descent algorithm
No ratings yet
chp2 Gradient Descent algorithm
5 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
Unit 2
No ratings yet
Unit 2
19 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
Unit-3
No ratings yet
Unit-3
47 pages
Unit 2
No ratings yet
Unit 2
13 pages
Lecture_2
No ratings yet
Lecture_2
31 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
DL UNIT-I
No ratings yet
DL UNIT-I
30 pages
5 Optimizers
No ratings yet
5 Optimizers
10 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
4 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
Learning Techniques For NILMTK
No ratings yet
Learning Techniques For NILMTK
9 pages
Advantages Bpa
No ratings yet
Advantages Bpa
38 pages
ANN Explanation Request Updated
No ratings yet
ANN Explanation Request Updated
44 pages
Unit 3
No ratings yet
Unit 3
110 pages
Gradient_Descent_(1)
No ratings yet
Gradient_Descent_(1)
8 pages
15-Hyperparameter Tuning - Batch Normalization-14!08!2024
No ratings yet
15-Hyperparameter Tuning - Batch Normalization-14!08!2024
4 pages
GlobalLogic - Optimization Algorithms For Machine Learning
No ratings yet
GlobalLogic - Optimization Algorithms For Machine Learning
4 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
DL Unit -2
No ratings yet
DL Unit -2
20 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Lecture 9 Loss, Optimizers, Batch Processing, Accuracy
No ratings yet
Lecture 9 Loss, Optimizers, Batch Processing, Accuracy
12 pages
Types of Gradient Descent
No ratings yet
Types of Gradient Descent
9 pages
Deep Learning (MODULE-2) (2)
No ratings yet
Deep Learning (MODULE-2) (2)
86 pages
DL UNIT2
No ratings yet
DL UNIT2
22 pages
Technical_writing
No ratings yet
Technical_writing
8 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
Data Mining Project 11
No ratings yet
Data Mining Project 11
18 pages
APKA Report
No ratings yet
APKA Report
3 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
Data Science Module 4 q & A
No ratings yet
Data Science Module 4 q & A
9 pages
DL Notes
No ratings yet
DL Notes
16 pages
Gradient Descent
No ratings yet
Gradient Descent
1 page
DL Ut - 1
No ratings yet
DL Ut - 1
14 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
g11 6 Trigonometry
No ratings yet
g11 6 Trigonometry
73 pages
ASTM-D3479-D3479M-19-2023-
No ratings yet
ASTM-D3479-D3479M-19-2023-
4 pages
8614, Unit 6 3
No ratings yet
8614, Unit 6 3
36 pages
Trigonometry Assignment 15
No ratings yet
Trigonometry Assignment 15
5 pages
Junior 05 Eng
No ratings yet
Junior 05 Eng
6 pages
11 12 H2 Prelim P1
No ratings yet
11 12 H2 Prelim P1
20 pages
Mathematical Language and Symbols (PART 1) : Worksheet
No ratings yet
Mathematical Language and Symbols (PART 1) : Worksheet
2 pages
Parametric Curves: Curves in Computer Graphics
No ratings yet
Parametric Curves: Curves in Computer Graphics
11 pages
Discrete Mathematics: Jemsheena P S Assistant Professor
No ratings yet
Discrete Mathematics: Jemsheena P S Assistant Professor
68 pages
Ch 2 River Hydraulics ppt(desu)
No ratings yet
Ch 2 River Hydraulics ppt(desu)
42 pages
Constructing A Nonagon Using Unmarked Straightedge and Compass Using The Des Method
No ratings yet
Constructing A Nonagon Using Unmarked Straightedge and Compass Using The Des Method
10 pages
Calculating A Slope: Topographic Maps Tutorial
No ratings yet
Calculating A Slope: Topographic Maps Tutorial
2 pages
Practical Engineering Process and Reliability Statistics 2nd Edition Mark Allen Durivage - Quickly download the ebook to never miss important content
100% (1)
Practical Engineering Process and Reliability Statistics 2nd Edition Mark Allen Durivage - Quickly download the ebook to never miss important content
83 pages
Quantitative Methods MM ZG515 / QM ZG515: L10: Decision Analysis
No ratings yet
Quantitative Methods MM ZG515 / QM ZG515: L10: Decision Analysis
16 pages
DPP Parabola
No ratings yet
DPP Parabola
2 pages
PBL Statitics
No ratings yet
PBL Statitics
7 pages
EC8074 Unit 4
No ratings yet
EC8074 Unit 4
3 pages
XII SQL Practical Notes 2024-25
No ratings yet
XII SQL Practical Notes 2024-25
23 pages
FEAP Examples
No ratings yet
FEAP Examples
57 pages
Tubitak - Complex Strip
No ratings yet
Tubitak - Complex Strip
15 pages
Wave
No ratings yet
Wave
3 pages
MS La Mod5
No ratings yet
MS La Mod5
28 pages
Digital Communication Lecture-2
No ratings yet
Digital Communication Lecture-2
73 pages
Criminal Face Recognition System: Alireza Chevelwalla, Ajay Gurav, Sachin Desai, Prof. Sumitra Sadhukhan
No ratings yet
Criminal Face Recognition System: Alireza Chevelwalla, Ajay Gurav, Sachin Desai, Prof. Sumitra Sadhukhan
4 pages
SSC CGL 2013 Solved
No ratings yet
SSC CGL 2013 Solved
12 pages

Gradient Descent 5 Part 2

Uploaded by

Gradient Descent 5 Part 2

Uploaded by

TRAINING NEURAL

Input Actual Output Predicted Output Update in weights

You might also like