0% found this document useful (0 votes)

8 views

CS 230 - Deep Learning Tips and Tricks Cheatsheet

The document provides a comprehensive cheatsheet for deep learning techniques, including data processing, training neural networks, parameter tuning, regularization, and good practices. Key concepts such as data augmentation, batch normalization, backpropagation, and various optimization methods like Adam and dropout are discussed. It serves as a quick reference for practitioners in the field of deep learning, summarizing essential techniques and best practices.

Uploaded by

Martin Kafula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

CS 230 - Deep Learning Tips and Tricks Cheatsheet

Uploaded by

Martin Kafula

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Want more content like this?

Subscribe here
(https://docs.google.com/forms/d/e/1FAIpQLSeOr-
yp8VzYIs4ZtE9HVkRcMJyDcJ2FieM82fUsFoCssHu9DA/viewform) to be notified of
new releases!
(https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-deep-learning-tips-and-
tricks#cs-230---deep-learning)CS 230 - Deep Learning (teaching/cs-230)
English 

Convolutional Neural Networks Recurrent Neural Networks Tips and tricks

(https://stanford.edu/~shervine/teaching/cs-
230/cheatsheet-deep-learning-tips-and-
tricks#cheatsheet)Deep Learning Tips and
Tricks cheatsheet Star 6,531

By Afshine Amidi (https://twitter.com/afshinea) and Shervine Amidi

(https://twitter.com/shervinea)
(https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
deep-learning-tips-and-tricks#data-processing)
Data processing
❐ Data augmentation ― Deep learning models usually need a lot of data to be properly
trained. It is often useful to get more data from the existing ones using data augmentation
techniques. The main ones are summed up in the table below. More precisely, given the
following input image, here are the techniques that we can apply:
Original Flip Rotation Random crop
• Image without any • Flipped with • Rotation with a • Random focus on
modification respect to an axis slight angle one part of the
for which the • Simulates image
meaning of the incorrect horizon • Several random
image is preserved calibration crops can be done
in a row
Color shift Noise addition Information loss Contrast change

• Nuances of RGB • Addition of noise • Parts of image • Luminosity

is slightly changed • More tolerance to ignored changes
• Captures noise quality variation of • Mimics potential • Controls
that can occur with inputs loss of parts of difference in
light exposure image exposition due to
time of day
Remark: data is usually augmented on the fly during training.

❐ Batch normalization ― It is a step of hyperparameter γ, β that normalizes the batch {xi }.

By noting μB , σB2 the mean and variance of that we want to correct to the batch, it is done as

follows:

xi − μ B
xi ⟵ γ +β

σB2 + ϵ

It is usually done after a fully connected/convolutional layer and before a non-linearity layer and
aims at allowing higher learning rates and reducing the strong dependence on initialization.
(https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
deep-learning-tips-and-tricks#running-nn)
Training a neural network
Definitions
❐Epoch ― In the context of training a model, epoch is a term used to refer to one iteration
where the model sees the whole training set to update its weights.
❐ Mini-batch gradient descent ― During the training phase, updating weights is usually not
based on the whole training set at once due to computation complexities or one data point due
to noise issues. Instead, the update step is done on mini-batches, where the number of data
points in a batch is a hyperparameter that we can tune.
❐ Loss function ― In order to quantify how a given model performs, the loss function L is
usually used to evaluate to what extent the actual outputs y are correctly predicted by the
model outputs z.

❐ Cross-entropy loss ― In the context of binary classification in neural networks, the cross-
entropy loss L(z, y) is commonly used and is defined as follows:

L(z, y) = −[y log(z) + (1 − y) log(1 − z)]

Finding optimal weights

❐ Backpropagation ― Backpropagation is a method to update the weights in the neural
network by taking into account the actual output and the desired output. The derivative with
respect to each weight w is computed using the chain rule.
Using this method, each weight is updated with the rule:
∂L(z, y)
w ⟵w−α
∂w

❐ Updating weights ― In a neural network, weights are updated as follows:

• Step 1: Take a batch of training data and perform forward propagation to compute the loss.
• Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight.
• Step 3: Use the gradients to update the weights of the network.

(https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
deep-learning-tips-and-tricks#parameter-tuning)
Parameter tuning
Weights initialization
❐ Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier
initialization enables to have initial weights that take into account characteristics that are
unique to the architecture.

❐ Transfer learning ― Training a deep learning model requires a lot of data and more
importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge
datasets that took days/weeks to train, and leverage it towards our use case. Depending on
how much data we have at hand, here are the different ways to leverage this:
Training Illustration Explanation
size

Small Freezes all layers, trains

weights on softmax
Freezes most layers,
Medium trains weights on last
layers and softmax
Trains weights on layers
Large and softmax by
initializing weights on
pre-trained ones

Optimizing convergence
❐ Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace
the weights get updated. It can be fixed or adaptively changed. The current most popular
method is called Adam, which is a method that adapts the learning rate.
❐ Adaptive learning rates ― Letting the learning rate vary when training a model can reduce
the training time and improve the numerical optimal solution. While Adam optimizer is the most
commonly used technique, others can also be useful. They are summed up in the table below:
Method Explanation Update of w Update of b
• Dampens oscillations
Momentum • Improvement to SGD w − αvdw b − αvdb
• 2 parameters to tune

• Root Mean Square

propagation
RMSprop • Speeds up learning w−α
dw
b⟵b−α
db

algorithm by

sdw sdb
controlling oscillations
• Adaptive Moment
estimation vdw b⟵b−
Adam • Most popular −

w α vdb
sdw + ϵ
method

α
sdb + ϵ

• 4 parameters to tune

Remark: other methods include Adadelta, Adagrad and SGD.

(https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
deep-learning-tips-and-tricks#regularization)
Regularization
❐ Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training
data by dropping out neurons with probability p > 0. It forces the model to avoid relying too
much on particular sets of features.

Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter
1 − p.

❐ Weight regularization ― In order to make sure that the weights are not too large and that
the model is not overfitting the training set, regularization techniques are usually performed on
the model weights. The main ones are summed up in the table below:
LASSO Ridge Elastic Net
• Shrinks coefficients to 0 Tradeoff between variab
• Good for variable selection Makes coefficients smaller selection and small
coefficients
... + λ[(1 − α)∣∣θ∣∣1 +
... + λ∣∣θ∣∣22

... + λ∣∣θ∣∣1
α∣∣θ∣∣22 ]

λ∈R λ∈R

λ ∈ R, α ∈ [0, 1]

❐ Early stopping ― This regularization technique stops the training process as soon as the
validation loss reaches a plateau or starts to increase.

(https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
deep-learning-tips-and-tricks#good-practices)
Good practices
❐ Overfitting small batch ― When debugging a model, it is often useful to make quick tests
to see if there is any major issue with the architecture of the model itself. In particular, in order
to make sure that the model can be properly trained, a mini-batch is passed inside the network
to see if it can overfit on it. If it cannot, it means that the model is either too complex or not
complex enough to even overfit on a small batch, let alone a normal-sized training set.
❐ Gradient checking ― Gradient checking is a method used during the implementation of the
backward pass of a neural network. It compares the value of the analytical gradient to the
numerical gradient at given points and plays the role of a sanity-check for correctness.
Type Numerical gradient Analytical gradient
Formula df dx
(x) ≈

f (x + h) − f (x − h)
2h

df
dx
(x) = f ′ (x)

• Expensive; loss has to be • 'Exact' result
computed two times per dimension • Direct computation
• Used to verify correctness of • Used in the final implementation
Comments analytical implementation
• Trade-off in choosing h not too
small (numerical instability) nor too
large (poor gradient approximation)

 (https://twitter.com/shervinea)  (https://linkedin.com/in/shervineamidi) 
(https://github.com/shervinea)  (https://scholar.google.com/citations?user=nMnMTm8AAAAJ) 
 (https://www.amazon.com/stores/author/B0B37XBSJL)

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Versant Test 31: Quiet Please
100% (3)
Versant Test 31: Quiet Please
5 pages
Nokia Advanced Optical Network Management NFM T Sample Course Document EN
100% (1)
Nokia Advanced Optical Network Management NFM T Sample Course Document EN
75 pages
Nokia GMPLS Controlled Optical Networks Sample Courseware Document EN
No ratings yet
Nokia GMPLS Controlled Optical Networks Sample Courseware Document EN
43 pages
Digital Signal Processing For High-Speed Optical Communication (275 Pages)
No ratings yet
Digital Signal Processing For High-Speed Optical Communication (275 Pages)
23 pages
Photonic Model v9 Infinera
No ratings yet
Photonic Model v9 Infinera
98 pages
Quantum Quest HF Series Brochure
100% (1)
Quantum Quest HF Series Brochure
4 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Deep Learning
100% (1)
Deep Learning
49 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Large Scale Deep Learning
No ratings yet
Large Scale Deep Learning
170 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
DL UNIT 3
No ratings yet
DL UNIT 3
14 pages
DL_UNIT_3_NOTES
No ratings yet
DL_UNIT_3_NOTES
16 pages
2. Deep Neural Network
No ratings yet
2. Deep Neural Network
60 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
Deep Learning Turorial PDF
No ratings yet
Deep Learning Turorial PDF
301 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
ITNN Week3
No ratings yet
ITNN Week3
21 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
CS 229 - Deep Learning Cheatsheet
No ratings yet
CS 229 - Deep Learning Cheatsheet
6 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
DEEP LEARNING
No ratings yet
DEEP LEARNING
38 pages
DL Intro
No ratings yet
DL Intro
64 pages
Adl Unit 1 2
No ratings yet
Adl Unit 1 2
67 pages
Unit 3
No ratings yet
Unit 3
110 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
L7-Lecture-Image.classification.DNN-v4
No ratings yet
L7-Lecture-Image.classification.DNN-v4
61 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Deep Learning Tutorial Complete (v3)
No ratings yet
Deep Learning Tutorial Complete (v3)
109 pages
Artificial Neural Networks: Introduction To Computational Neuroscience
No ratings yet
Artificial Neural Networks: Introduction To Computational Neuroscience
42 pages
MODULE 2 Deep Learning
No ratings yet
MODULE 2 Deep Learning
26 pages
Chapter21 4e
No ratings yet
Chapter21 4e
35 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
Module 2
No ratings yet
Module 2
67 pages
Deep Learning_Part II-1
No ratings yet
Deep Learning_Part II-1
23 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
UNIT3
No ratings yet
UNIT3
17 pages
Deep Learning With Keras - Quick Guide
No ratings yet
Deep Learning With Keras - Quick Guide
22 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Deep Learning Cheatsheet
No ratings yet
Deep Learning Cheatsheet
5 pages
ANN Formulas and Models (1)
No ratings yet
ANN Formulas and Models (1)
24 pages
Aidl Unit III
No ratings yet
Aidl Unit III
79 pages
AI - W7L13
No ratings yet
AI - W7L13
46 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
DS303_NN
No ratings yet
DS303_NN
20 pages
CAPSTONE PROJECT
No ratings yet
CAPSTONE PROJECT
7 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Lecture_2
No ratings yet
Lecture_2
31 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Restricted Boltzmann Machine: Fundamentals and Applications for Unlocking the Hidden Layers of Artificial Intelligence
From Everand
Restricted Boltzmann Machine: Fundamentals and Applications for Unlocking the Hidden Layers of Artificial Intelligence
Fouad Sabry
No ratings yet
Color Mapping: Exploring Visual Perception and Analysis in Computer Vision
From Everand
Color Mapping: Exploring Visual Perception and Analysis in Computer Vision
Fouad Sabry
No ratings yet
Optical Channel (Och) /optical Transmission Unit (Otu) Otn Layers
No ratings yet
Optical Channel (Och) /optical Transmission Unit (Otu) Otn Layers
1 page
NTT Technical Review, Vol. 14, No. 9, Sept. 2016
No ratings yet
NTT Technical Review, Vol. 14, No. 9, Sept. 2016
7 pages
02br-Ekinops360 Products Portfolio-Ekinops
No ratings yet
02br-Ekinops360 Products Portfolio-Ekinops
26 pages
Microprocessor and Microcontroller: Department of Electrical Engineering Air University, Islamabad
No ratings yet
Microprocessor and Microcontroller: Department of Electrical Engineering Air University, Islamabad
18 pages
CTRLPSB044
No ratings yet
CTRLPSB044
11 pages
Laterally Loaded Pile With Plaxis 3d
No ratings yet
Laterally Loaded Pile With Plaxis 3d
5 pages
Onida Service Manual Delite
100% (1)
Onida Service Manual Delite
20 pages
Digital Content Labels & Sensitive Topics
No ratings yet
Digital Content Labels & Sensitive Topics
5 pages
Sinus Series Technical Specifications
No ratings yet
Sinus Series Technical Specifications
1 page
17ee82 - Ida - Mod 4 Notes
No ratings yet
17ee82 - Ida - Mod 4 Notes
18 pages
User Manual: ATEQ Leak/Flow Calibrator (CDF)
No ratings yet
User Manual: ATEQ Leak/Flow Calibrator (CDF)
78 pages
Introduction To Cassandra
No ratings yet
Introduction To Cassandra
47 pages
Forklift Weighing Scales
No ratings yet
Forklift Weighing Scales
3 pages
Branches of Electronics
100% (1)
Branches of Electronics
2 pages
Kendriya Vidyalaya NO.1 Kalpakkam: Investigatory Project
No ratings yet
Kendriya Vidyalaya NO.1 Kalpakkam: Investigatory Project
6 pages
Certificat ATEX Barrière SI KCD2-SR-ExX.X
No ratings yet
Certificat ATEX Barrière SI KCD2-SR-ExX.X
5 pages
NCT90T Operator Programmer
No ratings yet
NCT90T Operator Programmer
143 pages
State of Texas Pricelist
No ratings yet
State of Texas Pricelist
675 pages
Thesis Statement Worksheet 6th Grade
100% (3)
Thesis Statement Worksheet 6th Grade
5 pages
Junos Monitoring and Troubleshooting
No ratings yet
Junos Monitoring and Troubleshooting
116 pages
Module-3-Business Marketing-VTU Syllabus
No ratings yet
Module-3-Business Marketing-VTU Syllabus
29 pages
From Fast To Feast - Indonesian Consumer Behavior During Ramadhan - Think With Google
No ratings yet
From Fast To Feast - Indonesian Consumer Behavior During Ramadhan - Think With Google
7 pages
SAPGUI Install Student
No ratings yet
SAPGUI Install Student
5 pages
Operations Management, Forecasting, MBA Lecture Notes
98% (64)
Operations Management, Forecasting, MBA Lecture Notes
8 pages
Rubric Group Research Presentation
No ratings yet
Rubric Group Research Presentation
1 page
CSS - Practical 3
No ratings yet
CSS - Practical 3
4 pages
Essay On Cultural Diversity in India
100% (2)
Essay On Cultural Diversity in India
7 pages
Blockchain Technology Beyond Cryptocurrency
No ratings yet
Blockchain Technology Beyond Cryptocurrency
2 pages
Light Motor Vehicle Driver's Examination Study Gu…
No ratings yet
Light Motor Vehicle Driver's Examination Study Gu…
1 page
Ingles-Postulacion para La Práctica Laboral: Mario Alberto Santana Carreño
100% (1)
Ingles-Postulacion para La Práctica Laboral: Mario Alberto Santana Carreño
6 pages
3B6 ASA CB Can Bus Angle Sensor
100% (2)
3B6 ASA CB Can Bus Angle Sensor
2 pages