0% found this document useful (0 votes)

3 views50 pages

Regularization Slides (2)

Uploaded by

kylorensw407

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views50 pages

Regularization Slides (2)

Uploaded by

kylorensw407

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Regularization and

Hyper parameters in
Deep Learning
Improve performance on unseen data by reducing overfitting
1. Preliminary understanding
2. Regularization - What & Why ?
3. Regularization Techniques
a. L 2, L1 regularization
b. Early stopping
c. Ensemble method - Drop out, Drop connect
d. Dataset augmentation
e. Adding Noise to the inputs / outputs
4. Hyperparameters in DL
1. Preliminary
understanding
➔ Model Fitting (Train-Test)

➔ Gold Standard Practice - A case

study

➔ Performance of model

➔ Model Generalization (Validate)

Model Fitting -
How well a
model performs on Three Situations

training & evaluation Underfit

Overfit

datasets?
Optimal fit / Good fit
Regression model
Fitting…….
➔ Under-fitting
Shows poor performance in training
Dataset or model capacity is Poor

➔ Optimal / Good-fitting / robust

fitting
Balanced - model at the sweet spot
between underfitting and overfitting

➔ Overfitting
Increased model Capacity of model
Gold standard
Practice - Limit
Overfit

➔ Use a resampling technique to estimate

model accuracy -Kfold cross validation

➔ Hold back a validation dataset. Sub set of

Training Dataset, 60 %, 40% ( 20 + 20)

➔ A case study : Compare 2 fits of a same

model
K-fold Cross validation
Hold Back a validation dataset
Fitting Vs Bias-
Variance
➔ Bias
Represent extent to which average
prediction over all train data sets

➔ Variance
Represent the extent to which the
model is sensitive to the particular
choice of data set (Test data)

➔ Relationship : Model Fitting and

Bias-Variance
Fitting Vs Bias-Variance
Model Performance
➔ Error (Generalization)
Due to generalization

Prediction error against Train Data +

Prediction error against Test Data

➔ Error due to :

Underfit + overfit

➔ Robust Model =
Min {Error}
Reasons and Counter measures
Counter measures: Underfit Reason for over-fit :
Increase the capacity (number of layers), Deep Neural networks are highly
Incorporation of more data set complex models.

Counter Measures : Over fit - Many parameters, many non-

linearities.
Model capacity is so big that it adapts too
well to training samples It is easy for them to overfit and
drive training error to 0.
Unable to generalize well to new, unseen
samples
Solution - Regularization
modification we make on a learning
algorithm that is intended to reduce
its generalization error, Strategy

But not its training error ? More data (or)

Reducing Network's
Capacity
Optimal ?
2. Regularization
➔ Reduces risk of overfitting

◆ Make an learning model to

perform well on Train data and
New input data

➔ Encourages models to have

preference towards simple
models

➔ Fewer parameter reduces the

computational power

➔ Best / Balanced performing

model
Regularization in
DeepNet ?
How does the regularization
works on Deep Neural Network
model ?
Simple Neural
Network model
Weight update to
Minimize the Loss
3. Regularization
Techniques
➔ L2, L1, Group regularization (Weight)

➔ Dataset augmentation

➔ Early stopping

➔ Ensemble method : Drop Out/Connect

Zero out input : Ridge(L2), Lasso (L1)

L2 adds “squared L1 adds “absolute Weight Penalties

magnitude” of value of magnitude” < leads> Smaller
coefficient as of coefficient as Weights <leads>
penalty term to penalty term to the
Simpler Model
the loss function. loss
<leads> Less
Overfit
math(W) represents the actual regularization operation.
λ (lambda) determines how strongly the regularization will influence the network’s training. ( 0 to n)
L1 (Lasso) Regularization
zeros out a certain input
Adds in a penalty for having weights of large absolute value.
Encourages model to make as many weights zero as possible.
Zero out inputs (L1) & Maximum punishment (L2)
Example : The weights corresponding to “Variable x (Blood pressure) ” and “Variable y (Body weight)”
are not useful in predicting future diagnosis of diseases.

L1 regularization :
0.5 gets a penalty of 0.5
L2 regularization
0.5 gets a penalty of 0.25
L1 a push to squish even small weights towards zero, more so than in L2 regularization

L1 regularization :
Weight of -9 gets a penalty of 9 but
L2 regularization
a weight of -9 gets a penalty of 81
Thus, bigger magnitude weights are punished much more severely in L2 regularization.
L1 & L2 regularization at the same time
Early stopping
There is point Thereafter, focus Solution
during training a only on learning the
• Stop whenever
large neural net statistical noise in
generalization
when the model the training dataset.
errors increases
will stop
generalizing
k-p
Return this model
Track the validation error

Have a watch parameter at “p”

If you are at step “k” and there was

no improvement in validation error in
the previous “p” steps
Tip
Then stop training and return the Keras Implementation has
option to save
model stored at step k − p
BEST_WEIGHT
https://keras.io/callbacks/
Callback during training
Zero out the nodes or connection :
Dropout, Drop Connect
Drops out some
nodes / links, an Dropping out can be Dropout nodes
efficient seen as temporarily /links so that the
approximate way of deactivating or network can
combining ignoring neurons of concentrate on
exponentially many the network other features.
different neural
networks
Randomly select a
subset of the units and
clamp their output to
zero, regardless of the
input;

Effectively removes
those units from the
model.

A different subset of
units is randomly
selected every time

Dropout
Disable individual
weights (i.e., set them
to zero), instead of
nodes, so a node can
remain partially active.

DropConnect is a
generalization of
DropOut because it
produces even more
possible models,

Drop connect
Dataset Augmentation
Typically, Works well for For some tasks it
NLP, may not be clear
More data =
image classification, how to generate
better learning
Object recognition, such data
Speech processing
4. Hyperparameters
in Deepnet
➔ What ?

➔ HP related to Network structure

➔ HP related Training methods

➔ Methods used to find out

Hyperparameters
Variables

Used to control the learning process

Hyperparameter Determines the network structure
By contrast, the values of other Determine how the network is
parameters are derived via trained
training,
Ex. weight, bias
HP related to Network

Number of Hidden Just keep adding layers until the test

error does not improve anymore
Dropout Dropout used on a larger network,
give l more of an opportunity to
Network Weight learn
Initialization According to the activation function

Activation function Generally, the rectifier activation

function
HP related Training methods

● Learning rate defines how quickly a network updates

its parameters.
● Low learning rate slows down the learning process but
converges smoothly.
● Larger learning rate speeds up the learning but may not
converge.
● Decaying Learning rate is preferred.
HP related Training methods
● Momentum : helps to know the direction of the next
step with the knowledge of the previous steps.
○ A typical choice between 0.5 to 0.9.
● Number of epochs, is the number of times the whole
training data is shown to the network while training.
○ Overfit when Increase
● Batch size, number of sub samples given to the network
after parameter update happens.
○ A good default for batch size might be 32.
○ Also tried with 32, 64, 128, 256, and so on.
Manual Search ,

Grid Search,
Find Out HP ?
Random Search,
Methods used to find out
Hyperparameters Bayesian Optimization

Steps To Use EXTENSION Table in SAP BAPI
No ratings yet
Steps To Use EXTENSION Table in SAP BAPI
16 pages
Radio Nav Exam 5
No ratings yet
Radio Nav Exam 5
12 pages
TP48200A V300R001C03 Training Slide - V1.0
100% (2)
TP48200A V300R001C03 Training Slide - V1.0
92 pages
DL Class3
No ratings yet
DL Class3
28 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
6_Tips for Training Deep Neural Networks
No ratings yet
6_Tips for Training Deep Neural Networks
59 pages
cours4
No ratings yet
cours4
30 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
Nndl Notes
No ratings yet
Nndl Notes
73 pages
DL+lect+7 (1)
No ratings yet
DL+lect+7 (1)
15 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
Unit 4
No ratings yet
Unit 4
35 pages
WEEK 10
No ratings yet
WEEK 10
69 pages
What is Regularization.
No ratings yet
What is Regularization.
10 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Regularization
No ratings yet
Regularization
3 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
26 pages
Regularization_for_Neural_Networks_1718966083
No ratings yet
Regularization_for_Neural_Networks_1718966083
9 pages
Underfitting Overfitting
No ratings yet
Underfitting Overfitting
38 pages
Deep-Learning-Module-2-Important-Topics-PYQs
No ratings yet
Deep-Learning-Module-2-Important-Topics-PYQs
30 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
DL_IT324a_3
No ratings yet
DL_IT324a_3
13 pages
12-Regularization for Deep Learning-17!08!2024
No ratings yet
12-Regularization for Deep Learning-17!08!2024
51 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
Lecture 6
No ratings yet
Lecture 6
41 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
S10_DNN_Regularization_wip
No ratings yet
S10_DNN_Regularization_wip
11 pages
Regularization
No ratings yet
Regularization
46 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Curs5site PDF
No ratings yet
Curs5site PDF
47 pages
unit-online-1.3
No ratings yet
unit-online-1.3
21 pages
DL MODULE 2
No ratings yet
DL MODULE 2
8 pages
LECTURE#9 EE258 F22 Part2 Draft v1
No ratings yet
LECTURE#9 EE258 F22 Part2 Draft v1
14 pages
CMPE257 - W2C3 - ML Fundamentals_ Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals_ Part 2
34 pages
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
20 pages
2.6 Regularization
No ratings yet
2.6 Regularization
24 pages
Module-4_3
No ratings yet
Module-4_3
20 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Lecture 1 Part II
No ratings yet
Lecture 1 Part II
24 pages
4 NN Regularization
No ratings yet
4 NN Regularization
13 pages
18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024
No ratings yet
18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024
5 pages
PA 4 UNIT
No ratings yet
PA 4 UNIT
33 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
Regularization in Deep Learning (1)
No ratings yet
Regularization in Deep Learning (1)
49 pages
Lecture week 8 part 2
No ratings yet
Lecture week 8 part 2
9 pages
Dataset Augmentation
No ratings yet
Dataset Augmentation
30 pages
Hyperparameter Tuning in DNNs
No ratings yet
Hyperparameter Tuning in DNNs
6 pages
UNIT LV
No ratings yet
UNIT LV
8 pages
mod4
No ratings yet
mod4
65 pages
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
No ratings yet
A Quick Guide On Basic Regularization Methods For Neural Networks - by Jaime Durán - Yottabytes - Medium
9 pages
Unit - 4 REGULARIZATION FOR DEEP LEARNING
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
56 pages
DL mod 2
No ratings yet
DL mod 2
4 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Owner's Manual: Subwoofers
No ratings yet
Owner's Manual: Subwoofers
4 pages
TLS & SSLv3 Renegotiation Vulnerability Explained
No ratings yet
TLS & SSLv3 Renegotiation Vulnerability Explained
12 pages
JVC kd-r450 kd-r451 kd-r452 kd-r453 kd-r454 kd-r455 kd-r456 kd-r457 kd-r459 kd-sr40j
No ratings yet
JVC kd-r450 kd-r451 kd-r452 kd-r453 kd-r454 kd-r455 kd-r456 kd-r457 kd-r459 kd-sr40j
73 pages
Python Programming Imp
No ratings yet
Python Programming Imp
5 pages
5.BPS Storm
No ratings yet
5.BPS Storm
8 pages
Clients List
No ratings yet
Clients List
14 pages
Image School: 5th Photo Challenge "FLOWER"
No ratings yet
Image School: 5th Photo Challenge "FLOWER"
3 pages
LIBLZMA For Soc Teams
No ratings yet
LIBLZMA For Soc Teams
16 pages
CSE 390a: Intro To Shell Scripting
No ratings yet
CSE 390a: Intro To Shell Scripting
22 pages
Zuma's Revenge Setup Log
No ratings yet
Zuma's Revenge Setup Log
23 pages
Design Development DS Robots
No ratings yet
Design Development DS Robots
11 pages
Preventive Maintenance Belts
No ratings yet
Preventive Maintenance Belts
6 pages
WYE Delta / Soft Starter Comparison Guide: Definition
No ratings yet
WYE Delta / Soft Starter Comparison Guide: Definition
7 pages
Irctc Ticket
50% (2)
Irctc Ticket
1 page
Operational Matrix
No ratings yet
Operational Matrix
18 pages
Laboratory Exercise 1: Switches, Lights, and Multiplexers
No ratings yet
Laboratory Exercise 1: Switches, Lights, and Multiplexers
8 pages
Communication Security
100% (2)
Communication Security
25 pages
suite des code d'activation WINDOWS XP2
No ratings yet
suite des code d'activation WINDOWS XP2
1 page
Computer Based Information System
100% (1)
Computer Based Information System
3 pages
Module 4 PDF
No ratings yet
Module 4 PDF
13 pages
MC 8051
No ratings yet
MC 8051
1 page
Teamcenter For Supplier Relationship Management: Strategic Sourcing
No ratings yet
Teamcenter For Supplier Relationship Management: Strategic Sourcing
12 pages
WSNS Question paper solved[1]
No ratings yet
WSNS Question paper solved[1]
6 pages
Cisco CCNA Lab 6.7.5 Subnet and Router Configuration
0% (1)
Cisco CCNA Lab 6.7.5 Subnet and Router Configuration
3 pages
Remove The Screws Counterclockwise: 5KW Storage Inverter Repairing Guide
No ratings yet
Remove The Screws Counterclockwise: 5KW Storage Inverter Repairing Guide
21 pages
PA R 205 6 - Data Sheet
No ratings yet
PA R 205 6 - Data Sheet
7 pages
Forticlient 5.6.6 Windows Release Notes
No ratings yet
Forticlient 5.6.6 Windows Release Notes
18 pages