Lecture 8.5

Uploaded by

kapiljain1989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Lecture 8.5

Uploaded by

kapiljain1989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

INFO 557 FA24 002 -

Neural Networks
Instructor: Dr. Liang Zhang
TAs: Jiacheng Zhang, Ruoyao Wang
College of Information Science
University of Arizona
Quiz at Tophat (Join: 436056)
Use the web app or mobile app to answer:

Which of the following optimization algorithms takes velocity into account

when taking a gradient descent step?

A SGD

B AdaGrad

C RMSProp

D Adam

With your group, come to consensus on the correct answer, and discuss

what is right or wrong about each of the answers.

Quiz at Tophat (Join: 436056)
Use the web app or mobile app to answer:

Which of the following optimization algorithms takes velocity into account

when taking a gradient descent step?

A SGD

B AdaGrad

C RMSProp

D Adam

With your group, come to consensus on the correct answer, and discuss

what is right or wrong about each of the answers.

AdaGrad
For each training step:

Properties:
+ Not very sensitive to initial learning rate
+ Large partial derivatives, large learning rate decrease
+ Small partial derivatives, small learning rate decrease
- Aggressive, monotonically decreasing learning rate
RMSProp
For each training step:

Properties:
+ Discards history from the extreme past
+ Less aggressive than AdaGrad
- Has 1 extra hyperparameter, 𝜌
Adam
For each training step:

Properties:
+ Incorporates momentum
+ Less biased than RMSProp
- Has 2 extra hyperparameters, 𝜌1 and 𝜌2
Group Activity
What would the paths of regular SGD, AdaGrad, RMSProp, and Adam look like on
the surface below? Redder is higher, bluer is lower, circle is the start, and star is
the minimum.
Group Activity
● SGD: Noisy, zigzagging, potential overshooting, slower convergence.
● AdaGrad: Large early steps, very cautious later, slower near the end.
● RMSProp: Smooth path, adjusts well to the surface, relatively stable.
● Adam: Fast, adaptive, smooth, and efficient convergence.
Selecting a learning algorithm
Good luck! Try a few and see what happens.
Tom Schaul, Ioannis Antonoglou, and David Silver. Unit tests for stochastic
optimization. ICLR 2014.
● Tries algorithms on simple cost function shapes
● Adaptive learning rates robust, but no clear winner
Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nathan Srebro, and Benjamin
Recht. The marginal value of adaptive gradient methods in machine learning.
NeuIPS 2017
● Tries algorithms on image and language tasks
● AdaGrad, RMSProp, and Adam all have worse generalization error than SGD

Otimization 2024_ver3
No ratings yet
Otimization 2024_ver3
42 pages
optimization techniques (SGD alternatives)
No ratings yet
optimization techniques (SGD alternatives)
34 pages
L5 Training Neural Networks Part 2 en v2
No ratings yet
L5 Training Neural Networks Part 2 en v2
70 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
Curs6site PDF
No ratings yet
Curs6site PDF
40 pages
BME 6407 - Class 10 (April 2023)
No ratings yet
BME 6407 - Class 10 (April 2023)
31 pages
Deep Learning (MODULE-2) (2)
No ratings yet
Deep Learning (MODULE-2) (2)
86 pages
Optimization For Deep Learning: Sebastian Ruder
No ratings yet
Optimization For Deep Learning: Sebastian Ruder
49 pages
Optimization Gradient Descent Method
No ratings yet
Optimization Gradient Descent Method
3 pages
Gradient Descent Overview
No ratings yet
Gradient Descent Overview
14 pages
ADL Unit-3
No ratings yet
ADL Unit-3
21 pages
ChatGPT
No ratings yet
ChatGPT
4 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
Soft Computing Assignment
No ratings yet
Soft Computing Assignment
9 pages
DL Class2
No ratings yet
DL Class2
30 pages
Adas: Adaptive Scheduling of Stochastic Gradients: Preprint. Under Review
No ratings yet
Adas: Adaptive Scheduling of Stochastic Gradients: Preprint. Under Review
19 pages
Rajesh (Dl Unit3) 06dec2024
No ratings yet
Rajesh (Dl Unit3) 06dec2024
67 pages
DocumentsTraining Neural Networks - Part II
No ratings yet
DocumentsTraining Neural Networks - Part II
91 pages
Important Optimization Algorithms Essentials
No ratings yet
Important Optimization Algorithms Essentials
12 pages
Optimizers
No ratings yet
Optimizers
4 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
Optimization in Machine Learning
No ratings yet
Optimization in Machine Learning
26 pages
Deep learning exp 2.3 MU
No ratings yet
Deep learning exp 2.3 MU
4 pages
GD Compare
No ratings yet
GD Compare
5 pages
equation GD
No ratings yet
equation GD
4 pages
17-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-21!08!2024
No ratings yet
17-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-21!08!2024
3 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
Lecture 8 Gradient Descent For Non-Convex Functions
No ratings yet
Lecture 8 Gradient Descent For Non-Convex Functions
21 pages
11 - Optimizers
No ratings yet
11 - Optimizers
16 pages
All Optimizers
No ratings yet
All Optimizers
13 pages
AdamZ research paper
No ratings yet
AdamZ research paper
13 pages
DL (2)
No ratings yet
DL (2)
18 pages
cst414-deep learning module 2
No ratings yet
cst414-deep learning module 2
13 pages
08 Training
No ratings yet
08 Training
18 pages
adam optimizer
No ratings yet
adam optimizer
14 pages
Module 2
No ratings yet
Module 2
67 pages
Optimizers Types
No ratings yet
Optimizers Types
6 pages
Adafactor - Adaptive Learning Rates With Sublinear Memory Cost
No ratings yet
Adafactor - Adaptive Learning Rates With Sublinear Memory Cost
9 pages
4_Gradient Descent and Stochastic GD
No ratings yet
4_Gradient Descent and Stochastic GD
37 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
main sgd
No ratings yet
main sgd
32 pages
Unit 4 Final
No ratings yet
Unit 4 Final
29 pages
d.manju23ba032
No ratings yet
d.manju23ba032
3 pages
cours5
No ratings yet
cours5
23 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Unit2 Optimizer
No ratings yet
Unit2 Optimizer
18 pages
DL
No ratings yet
DL
12 pages
Opti Incertitude
No ratings yet
Opti Incertitude
231 pages
769 Padam Closing The Generalizati
No ratings yet
769 Padam Closing The Generalizati
16 pages
19_22
No ratings yet
19_22
9 pages
Unit 2.2
No ratings yet
Unit 2.2
46 pages
L5 - UCLxDeepMind DL2020
No ratings yet
L5 - UCLxDeepMind DL2020
52 pages
Optimization
No ratings yet
Optimization
3 pages
SuperGD
No ratings yet
SuperGD
15 pages
2410.19706 (1)
No ratings yet
2410.19706 (1)
15 pages
Momentum, AdaGrad, RMSProp, Adam
No ratings yet
Momentum, AdaGrad, RMSProp, Adam
27 pages
Chapter-2 Single Feed Forward Netwotk
No ratings yet
Chapter-2 Single Feed Forward Netwotk
132 pages
ADAM-1
No ratings yet
ADAM-1
11 pages
Real-Time Critical Systems
From Everand
Real-Time Critical Systems
Jordan Lee Mauro-Buhagiar
3/5 (1)
Reinforcement Learning: A Practical Guide to Algorithms
From Everand
Reinforcement Learning: A Practical Guide to Algorithms
Trilokesh Khatri
No ratings yet

Lecture 8.5

Uploaded by

Lecture 8.5

Uploaded by

INFO 557 FA24 002 -

Which of the following optimization algorithms takes velocity into account

when taking a gradient descent step?

what is right or wrong about each of the answers.

Which of the following optimization algorithms takes velocity into account

when taking a gradient descent step?

what is right or wrong about each of the answers.

You might also like