0% found this document useful (0 votes)

15 views

4 - DNN Tip

The document provides tips for deep learning including: 1. Step 1 is to define a set of functions, Step 2 is to evaluate the goodness of each function on testing data, Step 3 is to pick the best performing function. 2. Different approaches work better for different problems, such as dropout for good testing performance. 3. Techniques like early stopping, regularization, new activation functions, and adaptive learning rates can help address overfitting and improve results.

Uploaded by

Jeffery Chia

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

4 - DNN Tip

Uploaded by

Jeffery Chia

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Tips for Deep Learning

Recipe of Deep Learning

YES

Step 1: define a NO
Good Results on
set of function
Testing Data?
Overfitting!
Step 2: goodness
of function YES

NO
Step 3: pick the Good Results on
best function Training Data?

Neural
Network
Do not always blame Overfitting
Not well trained

Overfitting?

Training Data Testing Data

Deep Residual Learning for Image Recognition

http://arxiv.org/abs/1512.03385
Recipe of Deep Learning
YES

Good Results on
Different approaches for Testing Data?
different problems.

e.g. dropout for good results YES

on testing data
Good Results on
Training Data?

Neural
Network
Recipe of Deep Learning
YES
Early Stopping
Good Results on
Testing Data?
Regularization

Dropout YES

Good Results on
New activation function Training Data?

Adaptive Learning Rate

Hard to get the power of Deep …

Results on Training Data

Deeper usually does not imply better.

Vanishing Gradient Problem
x1 …… y1
x2 …… y2
……

……
……

……

……
xN …… yM

Smaller gradients Larger gradients

Learn very slow Learn very fast

Almost random Already converge

based on random!?
Vanishing Gradient Problem
Smaller gradients

x1 …… 𝑦1 𝑦ො1
Small
x2 output
…… 𝑦2 𝑦ො2
……

……
……

……
𝑙

……

……
+∆𝑙
xN …… 𝑦𝑀 𝑦
ො
Large 𝑀
+∆𝑤 input

Intuitive way to compute the derivatives …

𝜕𝑙 ∆𝑙
=?
𝜕𝑤 ∆𝑤
ReLU
• Rectified Linear Unit (ReLU)
Reason:
𝑎
𝜎 𝑧 1. Fast to compute
𝑎=𝑧
2. Biological reason
𝑎=0 3. Infinite sigmoid
𝑧
with different biases
4. Vanishing gradient
[Xavier Glorot, AISTATS’11]
[Andrew L. Maas, ICML’13] problem
[Kaiming He, arXiv’15]
𝑎
𝑎=𝑧
ReLU
𝑎=0
𝑧
0

x1 y1

0 y2
x2
0

0
𝑎
𝑎=𝑧
ReLU
A Thinner linear network 𝑎=0
𝑧

x1 y1

y2
x2
Do not have
smaller gradients
ReLU - variant

𝐿𝑒𝑎𝑘𝑦 𝑅𝑒𝐿𝑈 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 𝑅𝑒𝐿𝑈

𝑎 𝑎
𝑎=𝑧 𝑎=𝑧

𝑧 𝑧
𝑎 = 0.01𝑧 𝑎 = 𝛼𝑧

α also learned by
gradient descent
Maxout ReLU is a special cases of Maxout

• Learnable activation function [Ian J. Goodfellow, ICML’13]

+ 5 neuron + 1
Input
Max 7 Max 2
x1 + 7 + 2

x2 + −1 + 4
Max 1 Max 4
+ 1 + 3

You can have more than 2 elements in a group.

Maxout ReLU is a special cases of Maxout

𝑧 + 𝑧1
Input 𝑤 ReLU 𝑎 Input 𝑤 Max 𝑎
𝑏
x x 0 + 𝑧2 𝑚𝑎𝑥 𝑧1 , 𝑧2
𝑏
0
1 1

𝑎 𝑎
𝑧 = 𝑤𝑥 + 𝑏
𝑧1 = 𝑤𝑥 + 𝑏
𝑥 𝑥
𝑧2 =0
Maxout More than ReLU

𝑧 + 𝑧1
Input 𝑤 ReLU 𝑎 Input 𝑤 Max 𝑎
𝑏
x x + 𝑧2 𝑚𝑎𝑥 𝑧1 , 𝑧2
𝑤′
𝑏
1 𝑏′
1 Learnable Activation
Function
𝑎 𝑎
𝑧 = 𝑤𝑥 + 𝑏
𝑧1 = 𝑤𝑥 + 𝑏
𝑥 𝑥

𝑧2 = 𝑤 ′ 𝑥 + 𝑏 ′
Maxout
• Learnable activation function [Ian J. Goodfellow, ICML’13]
• Activation function in maxout network can be
any piecewise linear convex function
• How many pieces depending on how many
elements in a group

2 elements in a group 3 elements in a group

Maxout - Training
• Given a training data x, we know which z would be
the max
+ 𝑧11 + 𝑧12
Input
Max 𝑎11 Max 𝑎12
x1 + 𝑧21 𝑚𝑎𝑥 𝑧11 , 𝑧21 + 𝑧22

x2 + 𝑧31 + 𝑧32

𝑥 Max 𝑎21 Max 𝑎22

+ 𝑧41 𝑎 1 + 𝑧42 𝑎2
Maxout - Training
• Given a training data x, we know which z would be
the max
+ 𝑧11 + 𝑧12
Input
𝑎11 𝑎12
x1 + 𝑧21 + 𝑧22

x2 + 𝑧31 + 𝑧32

𝑥 𝑎21 𝑎22
+ 𝑧41 𝑎 1 + 𝑧42 𝑎2
• Train this thin and linear network
Different thin and linear network for different examples
Recipe of Deep Learning
YES
Early Stopping
Good Results on
Testing Data?
Regularization

Dropout YES

Good Results on
New activation function Training Data?

Adaptive Learning Rate

Review Smaller
Learning Rate

𝑤2

Larger
Learning Rate

Adagrad
𝑤1
𝜂
𝑤 𝑡+1 ← 𝑤𝑡 − 𝑔𝑡
σ𝑡𝑖=0 𝑔𝑖 2

Use first derivative to estimate second derivative

RMSProp
Error Surface can be very complex when training NN.

Smaller
Learning Rate
𝑤2

Larger
Learning Rate

𝑤1
RMSProp
𝜂 0
𝑤1 ← − 0𝑔𝑤0 𝜎 0 = 𝑔0
𝜎
2 1
𝜂 1
𝑤 ← 𝑤 − 1𝑔 𝜎1 = 𝛼 𝜎0 2 + 1 − 𝛼 𝑔1 2
𝜎
3 2
𝜂 2
𝑤 ← 𝑤 − 2𝑔 𝜎2 = 𝛼 𝜎1 2 + 1 − 𝛼 𝑔2 2
𝜎
……

𝜂 𝑡
𝑤 𝑡+1 ← 𝑤𝑡 − 𝑡𝑔 𝜎𝑡 = 𝛼 𝜎 𝑡−1 2 + 1 − 𝛼 𝑔𝑡 2
𝜎
Root Mean Square of the gradients
with previous gradients being decayed
Hard to find
optimal network parameters
Total
Loss Very slow at the
plateau
Stuck at saddle point

Stuck at local minima

𝜕𝐿 ∕ 𝜕𝑤 𝜕𝐿 ∕ 𝜕𝑤 𝜕𝐿 ∕ 𝜕𝑤
≈0 =0 =0

The value of a network parameter w

In physical world ……
• Momentum

How about put this phenomenon

in gradient descent?
Review: Vanilla Gradient Descent

𝛻𝐿 𝜃 0 Start at position 𝜃 0

𝛻𝐿 𝜃1 Compute gradient at 𝜃 0
𝜃0
𝜃 1 Move to 𝜃 1 = 𝜃 0 - η𝛻𝐿 𝜃 0
𝛻𝐿 𝜃 2
Compute gradient at 𝜃 1
Gradient 𝜃2
Move to 𝜃 2 = 𝜃 1 – η𝛻𝐿 𝜃 1
Movement 𝛻𝐿 𝜃3
𝜃3

……
Stop until 𝛻𝐿 𝜃 𝑡 ≈ 0
Momentum
Movement: movement of last Start at point 𝜃 0
step minus gradient at present Movement v0=0
𝛻𝐿 𝜃 0 Compute gradient at 𝜃 0
𝛻𝐿 𝜃 1
𝜃0 Movement v1 = λv0 - η𝛻𝐿 𝜃 0

𝜃1 Move to 𝜃 1 = 𝜃 0 + v1
𝛻𝐿 𝜃 2
Compute gradient at 𝜃 1
𝜃2
Gradient Movement v2 = λv1 - η𝛻𝐿 𝜃 1
Movement Move to 𝜃 2 = 𝜃 1 + v2
𝜃3
Movement 𝛻𝐿 𝜃 3 Movement not just based
of last step on gradient, but previous
movement.
Momentum
Movement: movement of last Start at point 𝜃 0
step minus gradient at present Movement v0=0
Compute gradient at 𝜃 0
vi is actually the weighted sum of
all the previous gradient: Movement v1 = λv0 - η𝛻𝐿 𝜃 0
𝛻𝐿 𝜃 0 ,𝛻𝐿 𝜃 1 , … 𝛻𝐿 𝜃 𝑖−1 Move to 𝜃 1 = 𝜃 0 + v1
v0 = 0 Compute gradient at 𝜃 1
v1 = - η𝛻𝐿 𝜃0 Movement v2 = λv1 - η𝛻𝐿 𝜃 1
Move to 𝜃 2 = 𝜃 1 + v2
v2 = - λ η𝛻𝐿 𝜃 0 - η𝛻𝐿 𝜃 1
Movement not just based
……

on gradient, but previous

movement
Still not guarantee reaching
Momentum global minima, but give some
hope ……
cost
Movement =
Negative of 𝜕𝐿∕𝜕𝑤 + Momentum
Negative of 𝜕𝐿 ∕ 𝜕𝑤
Momentum
Real Movement

𝜕𝐿∕𝜕𝑤 = 0
Adam RMSProp + Momentum

for momentum
for RMSprop
Recipe of Deep Learning
YES
Early Stopping
Good Results on
Testing Data?
Regularization

Dropout YES

Good Results on
New activation function Training Data?

Adaptive Learning Rate

Early Stopping
Total
Loss
Stop at Validation set
here Testing set

Training set

Epochs
http://keras.io/getting-started/faq/#how-can-i-interrupt-training-when-
Keras: the-validation-loss-isnt-decreasing-anymore
Recipe of Deep Learning
YES
Early Stopping
Good Results on
Testing Data?
Regularization

Dropout YES

Good Results on
New activation function Training Data?

Adaptive Learning Rate

Regularization
• New loss function to be minimized
• Find a set of weight not only minimizing original
cost but also close to zero

L   L    
1
2
Regularization term
2
  w1 , w2 ,
Original loss L2 regularization:
  w1   w2   
2 2
(e.g. minimize square
2
error, cross entropy …)
(usually not consider biases)
L2 regularization:
Regularization   w1   w2   
2 2
2

• New loss function to be minimized

L L
L   L    
1
Gradient:   w
2 2
w w
L  L t 
Update: w t 1
 w 
t
 w  
t
 w 
w  w 
L
 1   w  
t
Weight Decay
w
Closer to zero
L1 regularization:
Regularization  1  w1  w2  

• New loss function to be minimized

L L
L   L       sgn w
1

2 1
w w
Update:
L  L
w  w 
t 1 t
 w  
t
  sgn w 
t 

w  w 
L
 w 
t

w
 
  sgn w Always delete
t

L …… L2
 1   w  
t

w
Regularization - Weight Decay
• Our brain prunes out the useless link between
neurons.

Doing the same thing to machine’s brain improves

the performance.
Recipe of Deep Learning
YES
Early Stopping
Good Results on
Testing Data?
Regularization

Dropout YES

Good Results on
New activation function Training Data?

Adaptive Learning Rate

Dropout
Training:

➢ Each time before updating the parameters

 Each neuron has p% to dropout
Dropout
Training:

Thinner!

➢ Each time before updating the parameters

 Each neuron has p% to dropout
The structure of the network is changed.
 Using the new network for training
For each mini-batch, we resample the dropout neurons
Dropout
Testing:

➢ No dropout
 If the dropout rate at training is p%,
all the weights times 1-p%
 Assume that the dropout rate is 50%.
If a weight w = 1 by training, set 𝑤 = 0.5 for testing.
Dropout
- Intuitive Reason
Testing
No dropout
(拿下重物後就變很強)
Training
Dropout (腳上綁重物)
Dropout - Intuitive Reason
我的 partner
會擺爛，所以
我要好好做

Set 1 Set 2 Set 3 Set 4

Network Network Network Network

1 2 3 4

Train a bunch of networks with different structures

Dropout is a kind of ensemble.
Ensemble
Testing data x

Network Network Network Network

1 2 3 4

y1 y2 y3 y4

average
Dropout is a kind of ensemble.
minibatch minibatch minibatch minibatch Training of
1 2 3 4 Dropout

M neurons

……
2M possible
networks

➢Using one mini-batch to train one network

➢Some parameters in the network are shared
Dropout is a kind of ensemble.
Testing of Dropout testing data x

All the
weights

……
multiply
1-p%

y1 y2 y3
?????
average ≈ y
Testing of Dropout
x1 x2
x1 x2 x1 x2 w1 w2
w1 w2 w1 w2

z=w1x1+w2x2

z=w1x1+w2x2 z=w2x2
x1 x2
x1 x2 x1 x2
1 1
w1 w1 w1 w2
w2 w2 2 2

1 1
z=w1x1 𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2
z=0 2 2
Recipe of Deep Learning
YES

Step 1: define a NO
Good Results on
set of function
Testing Data?
Overfitting!
Step 2: goodness
of function YES

NO
Step 3: pick the Good Results on
best function Training Data?

Neural
Network
Try another task
政治
“stock” in document
經濟
Machine

體育
“president” in document

體育政治財經
http://top-breaking-news.com/
Try another task
Live Demo

HW5e - Int - Test Unit 3A
50% (4)
HW5e - Int - Test Unit 3A
6 pages
National Languages and Language Planning
90% (20)
National Languages and Language Planning
19 pages
Dimitrije Djordjevic - The Role of St. Vitus Day in Modern Serbian History
No ratings yet
Dimitrije Djordjevic - The Role of St. Vitus Day in Modern Serbian History
10 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Artificial Neural Networks_dl
No ratings yet
Artificial Neural Networks_dl
55 pages
06 AIS302 ANN backpropagation
No ratings yet
06 AIS302 ANN backpropagation
83 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Training Neural Networks
No ratings yet
Training Neural Networks
109 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Lec 15 MLP Cont
No ratings yet
Lec 15 MLP Cont
34 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
cours4
No ratings yet
cours4
30 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
Machine Learning With Artificial Neural Networks
No ratings yet
Machine Learning With Artificial Neural Networks
44 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
ITNN Week3
No ratings yet
ITNN Week3
21 pages
NeuralNetworks
No ratings yet
NeuralNetworks
29 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Slides 11
No ratings yet
Slides 11
48 pages
UNIT-2 Foundations of Deep Learning
No ratings yet
UNIT-2 Foundations of Deep Learning
64 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Module 2
No ratings yet
Module 2
13 pages
Lecture 7
No ratings yet
Lecture 7
138 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Lec 8
No ratings yet
Lec 8
43 pages
Neural network intro lecture 4
No ratings yet
Neural network intro lecture 4
46 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
Dat 300
No ratings yet
Dat 300
12 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
Feedforward Networks: Marco Kuhlmann
No ratings yet
Feedforward Networks: Marco Kuhlmann
53 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
Handbook+ +Neural+Networks
No ratings yet
Handbook+ +Neural+Networks
5 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
AI - W7L13
No ratings yet
AI - W7L13
46 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
Lecture 05
No ratings yet
Lecture 05
34 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Attacking Problems in Logarithms and Exponential Functions
From Everand
Attacking Problems in Logarithms and Exponential Functions
David S. Kahn
5/5 (1)
3 - Classification (v3)
No ratings yet
3 - Classification (v3)
35 pages
4 - DL (v2)
No ratings yet
4 - DL (v2)
32 pages
Ceb 1032/ CDB 1012 Health, Safety and Environment (Hse) January 2022 Semester Group Project
No ratings yet
Ceb 1032/ CDB 1012 Health, Safety and Environment (Hse) January 2022 Semester Group Project
2 pages
CEB 1032/CDB 1012 Health, Safety and Environment Semester Jan 2022
No ratings yet
CEB 1032/CDB 1012 Health, Safety and Environment Semester Jan 2022
4 pages
Nazi Party The Early Years
No ratings yet
Nazi Party The Early Years
13 pages
Remainder Theorem
No ratings yet
Remainder Theorem
2 pages
cuối hk2 lớp 6 smart world-đề 5
No ratings yet
cuối hk2 lớp 6 smart world-đề 5
3 pages
Practical 2-10 XII 21-22
No ratings yet
Practical 2-10 XII 21-22
7 pages
GSM Commands List
No ratings yet
GSM Commands List
199 pages
Art Appreciation Finals II
No ratings yet
Art Appreciation Finals II
20 pages
Use of English i Gst 101
No ratings yet
Use of English i Gst 101
54 pages
Transformation of Form
No ratings yet
Transformation of Form
10 pages
Advanced Computer Architectures: 17CS72 (As Per CBCS Scheme)
0% (1)
Advanced Computer Architectures: 17CS72 (As Per CBCS Scheme)
51 pages
Second Quarter Examination
No ratings yet
Second Quarter Examination
3 pages
Strand Sort
No ratings yet
Strand Sort
18 pages
Filmmaking & Art Resources
No ratings yet
Filmmaking & Art Resources
1 page
Electric Bill Format
75% (4)
Electric Bill Format
2 pages
Coin Changing 2x2
No ratings yet
Coin Changing 2x2
4 pages
Rabelais 'Carnival in Kerala Through The Eyes of Bakhtin
No ratings yet
Rabelais 'Carnival in Kerala Through The Eyes of Bakhtin
5 pages
Pooja Thapa - QA
No ratings yet
Pooja Thapa - QA
3 pages
class 1-5th
No ratings yet
class 1-5th
5 pages
Marriage, Sex and Death: The Family and The Fall of The Roman West
No ratings yet
Marriage, Sex and Death: The Family and The Fall of The Roman West
255 pages
Placement Test Grade 11 September
No ratings yet
Placement Test Grade 11 September
4 pages
Prophet Muhammed: His Dark Side and The Falseties of His Doctrine
No ratings yet
Prophet Muhammed: His Dark Side and The Falseties of His Doctrine
47 pages
2.4.3.4 Lab - Configuring HSRP and GLBP - ILM
No ratings yet
2.4.3.4 Lab - Configuring HSRP and GLBP - ILM
16 pages
21st Century Module 1
No ratings yet
21st Century Module 1
34 pages
LAYAC ES 2022 Fil Values Month
No ratings yet
LAYAC ES 2022 Fil Values Month
4 pages
Calculus I - The Limit
No ratings yet
Calculus I - The Limit
6 pages
Vasala-Sutta
No ratings yet
Vasala-Sutta
8 pages
Alteraciones de La Succión en Prematuros Y Su Implicación en Logopedia
No ratings yet
Alteraciones de La Succión en Prematuros Y Su Implicación en Logopedia
36 pages
KKI-English_One word test - II
No ratings yet
KKI-English_One word test - II
2 pages

4 - DNN Tip

Uploaded by

4 - DNN Tip

Uploaded by

Tips for Deep Learning

Recipe of Deep Learning

Training Data Testing Data

Deep Residual Learning for Image Recognition

e.g. dropout for good results YES

Adaptive Learning Rate

Results on Training Data

Deeper usually does not imply better.

Smaller gradients Larger gradients

Learn very slow Learn very fast

Almost random Already converge

Intuitive way to compute the derivatives …

𝐿𝑒𝑎𝑘𝑦 𝑅𝑒𝐿𝑈 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐 𝑅𝑒𝐿𝑈

• Learnable activation function [Ian J. Goodfellow, ICML’13]

You can have more than 2 elements in a group.

2 elements in a group 3 elements in a group

𝑥 Max 𝑎21 Max 𝑎22

Adaptive Learning Rate

Use first derivative to estimate second derivative

Stuck at local minima

The value of a network parameter w

How about put this phenomenon

on gradient, but previous

Adaptive Learning Rate

Adaptive Learning Rate

• New loss function to be minimized

• New loss function to be minimized

Doing the same thing to machine’s brain improves

Adaptive Learning Rate

➢ Each time before updating the parameters

➢ Each time before updating the parameters

➢ When teams up, if everyone expect the partner will do

Set 1 Set 2 Set 3 Set 4

Network Network Network Network

Train a bunch of networks with different structures

Network Network Network Network

➢Using one mini-batch to train one network

You might also like