Regularization Slides (2)
Regularization Slides (2)
Hyper parameters in
Deep Learning
Improve performance on unseen data by reducing overfitting
1. Preliminary understanding
2. Regularization - What & Why ?
3. Regularization Techniques
a. L 2, L1 regularization
b. Early stopping
c. Ensemble method - Drop out, Drop connect
d. Dataset augmentation
e. Adding Noise to the inputs / outputs
4. Hyperparameters in DL
1. Preliminary
understanding
➔ Model Fitting (Train-Test)
➔ Performance of model
datasets?
Optimal fit / Good fit
Regression model
Fitting…….
➔ Under-fitting
Shows poor performance in training
Dataset or model capacity is Poor
➔ Overfitting
Increased model Capacity of model
Gold standard
Practice - Limit
Overfit
➔ Variance
Represent the extent to which the
model is sensitive to the particular
choice of data set (Test data)
➔ Error due to :
Underfit + overfit
➔ Robust Model =
Min {Error}
Reasons and Counter measures
Counter measures: Underfit Reason for over-fit :
Increase the capacity (number of layers), Deep Neural networks are highly
Incorporation of more data set complex models.
➔ Dataset augmentation
➔ Early stopping
L1 regularization :
0.5 gets a penalty of 0.5
L2 regularization
0.5 gets a penalty of 0.25
L1 a push to squish even small weights towards zero, more so than in L2 regularization
L1 regularization :
Weight of -9 gets a penalty of 9 but
L2 regularization
a weight of -9 gets a penalty of 81
Thus, bigger magnitude weights are punished much more severely in L2 regularization.
L1 & L2 regularization at the same time
Early stopping
There is point Thereafter, focus Solution
during training a only on learning the
• Stop whenever
large neural net statistical noise in
generalization
when the model the training dataset.
errors increases
will stop
generalizing
k-p
Return this model
Track the validation error
Effectively removes
those units from the
model.
A different subset of
units is randomly
selected every time
Dropout
Disable individual
weights (i.e., set them
to zero), instead of
nodes, so a node can
remain partially active.
DropConnect is a
generalization of
DropOut because it
produces even more
possible models,
Drop connect
Dataset Augmentation
Typically, Works well for For some tasks it
NLP, may not be clear
More data =
image classification, how to generate
better learning
Object recognition, such data
Speech processing
4. Hyperparameters
in Deepnet
➔ What ?
Grid Search,
Find Out HP ?
Random Search,
Methods used to find out
Hyperparameters Bayesian Optimization