Unit-2 L3 (3)

Uploaded by

pari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Unit-2 L3 (3)

Uploaded by

pari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Regularization

Early Stopping
Early Stopping
• Increase in validation set error:
• When training large models with sufficient representational
capacity to overfit the task, training error decreases steadily
over time, but validation set error begins to rise again.
• An example of this behavior is shown next
Early Stopping
• Increase in validation set error:
• Shows how negative log-likelihood loss changes over time
(indicated as no. of Training iterations over the data set, or
epochs). In this example, we train a maxout network on
MNIST (maxout generalizes RELU further) Training
objective decreases consistently over time, but validation
set average Loss eventually begins to increase again
forming an asymmetric U-shaped curve
Early Stopping
• Saving parameters
• We can thus obtain a model with better validation set error
(and thus better test error) by returning to the parameter
setting at the point of time with the lowest validation set error.
• Every time the error on the validation set improves, we store a
copy of the model parameters.
• When the training algorithm terminates, we return to these
parameters, rather than the latest set.
Early Stopping
• Early stopping meta algorithm:
Early Stopping
• Strategy of Early Stopping:
• The above strategy is known as Early Stopping
• It is the most common form of regularization in deep learning
• Its popularity is due to its effectiveness and its simplicity
• We can think of early stopping as a very efficient hyperparameter
selection algorithm
• In this view no. of training steps is just a hyperparameter
• This hyperparameter has a U-shaped validation set performance curve
• Most hyperparameters have such a U-shaped validation set
performance curve, as seen below
• In the case of early stopping, we are controlling the effective capacity
of the model by determining how many steps it can take to fit the
training set
Early Stopping
• Early Stopping as Regularization:
• Early stopping is an unobtrusive form of regularization
• It requires almost no change to the underlying training
procedure, the objective function, or the set of allowable
parameter values
• So it is easy to use early stopping without damaging the
learning dynamics
• In contrast to weight decay, where we must be careful not to
use too much weight decay
• Otherwise, we trap the network in a bad local minimum
corresponding to pathologically small weights
Early Stopping
• Use of a second training step :
• Early stopping requires a validation set
• Thus some training data is not fed to the model
• To best exploit this extra data, one can perform extra
training after the initial training with early stopping has
completed
• In the second extra training step, all the training data is
included
• There are two basic strategies for the second training
procedure
Early Stopping
• First Strategy for Retraining
• One strategy is to initialize the model again and retrain
on all the data
• In the second training pass, we train for the same no. of
steps as the early stopping procedure determined was
optimal in first pass
• Whether to retrain for the same no. of parameter
updates or the same no of passes through the data set?
• On the second round, each pass through dataset will require
more parameter updates because dataset is bigger
Early Stopping
• First meta-algorithm for retraining
• A meta-algorithm for using early stopping to
determine how long to train, then retraining on all
the data
Early Stopping
• Second strategy for retraining
• Keep all the parameters obtained from the first round of
training and then continue training but now using all the
data
• We no longer have a guide for when to stop in terms of
the no of steps
• Instead, we monitor the average loss function on the
validation set and continue training until it falls below
the value of the training set objective of when early
stopping halted
Early Stopping
• Second meta algorithm for retraining
• Meta-algorithm using early stopping to determine at
what objective value we start to overfit, then continue
training until that value is reached
Early Stopping
• Early stopping as a regularizer:
• So far we have stated that early stopping is a regularization
strategy
• But supported the claim only by showing learning curves
where the validation set error has a U shaped curve
• What is the actual mechanism by which early stopping
regularizes the model?
• Early stopping has the effect of restricting the optimizing
procedure to a relatively small volume of parameter space in
the neighborhood of the initial parameters θ0
Early Stopping
• Early Stopping vs L regularization
2

• Two weights, Solid contour lines: contours of negative log-likelihood

• Left: dashed lines indicates trajectory of SGD. Rather than stopping at
point w* that minimizes cost, early stopping results in an earlier point in
trajectory
• Right: dashed circles indicate contours of L2 penalty which causes the
minimum of the total cost to lie nearer the origin than the minimum of
the unregularized cost
Regularization
Parameter Tying and Parameter Sharing
Parameter Tying
• L2 regularization (or weight decay) penalizes model
parameters for deviating from fixed value of zero
• Sometimes we need other ways to express prior knowledge of
parameters
• We may know from domain and model architecture that there
should be some dependencies between model parameters
• We want to express that certain parameters should be close to one
another
Parameter Tying
• A scenario of parameter tying:
• Two models performing the same classification task (with same set of
classes) but with somewhat different input distributions
• Model A with parameters w(A)
• Model B with parameters w(B)
• The two models map the input to two different but related outputs
Parameter Tying
• L2 penalty for parameter tying
• If the tasks are similar enough (perhaps with similar input and
output distributions) then we believe that the model
parameters should be close to each other:

• We can leverage this information via regularization

• Use a parameter norm penalty
Parameter Tying
• Use of Parameter Tying
• Approach was used for regularizing the parameters of one
model, trained as a supervised classifier, to be close to the
parameters of another model, trained in an unsupervised
paradigm (to capture the distribution of the input data)
• Ex. of unsupervised learning: k-means clustering
• Input x is mapped to a one-hot vector h. If x belongs to cluster i then hi
= 1 and the rest are zero corresponding to its cluster
• It could trained using an autoencoder with k hidden units
Parameter Sharing
• Parameter sharing is where we:
• Force sets of parameters to be equal
• Because we interpret various models or model components as sharing a unique set of
parameters
• Only a subset of the parameters needs to be stored in memory
• In a CNN significant reduction in the memory footprint of the model
Parameter Sharing
• CNN parameters
Parameter Sharing
• Use of parameter sharing in CNNs
• Most extensive use of parameter sharing is in convolutional neural
networks (CNNs)
• Natural images have many statistical properties that are invariant to
translation
• Ex: photo of a cat remains a photo of a cat if it is translated one pixel
to the right
• CNNs take this property into account by sharing parameters across
multiple image locations
• Thus we can find a cat with the same cat detector whether the cat
appears at column i or column i+1 in the image
Parameter Sharing
• Simple description of CNN

Practical Statistical Process Control
From Everand
Practical Statistical Process Control
Colin Hardwick
5/5 (9)
Numenta Case Analysis-Group 2
100% (1)
Numenta Case Analysis-Group 2
3 pages
Early Stopping, Dropout, Augmentation, Optimizers New
No ratings yet
Early Stopping, Dropout, Augmentation, Optimizers New
91 pages
Regularization
No ratings yet
Regularization
18 pages
Early Stoppings
No ratings yet
Early Stoppings
18 pages
Unit-3
No ratings yet
Unit-3
47 pages
4. Regularization
No ratings yet
4. Regularization
19 pages
2 Early Stopping - But When?
No ratings yet
2 Early Stopping - But When?
2 pages
Automatic Early Stopping Using Cross Validation: Quantifying The Criteria
No ratings yet
Automatic Early Stopping Using Cross Validation: Quantifying The Criteria
7 pages
Lecture 05 - Regularization - 4p
No ratings yet
Lecture 05 - Regularization - 4p
21 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Slides Regu Early Stopping
No ratings yet
Slides Regu Early Stopping
5 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
Unit - 4 REGULARIZATION FOR DEEP LEARNING
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
56 pages
DL_M2_Regularization
No ratings yet
DL_M2_Regularization
12 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
Unit 2
No ratings yet
Unit 2
18 pages
An Overview of Overfitting and Its Solutions
No ratings yet
An Overview of Overfitting and Its Solutions
7 pages
Dataset Augmentation
No ratings yet
Dataset Augmentation
30 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
Deep Learning Module 3-1
No ratings yet
Deep Learning Module 3-1
31 pages
unit-online-1.3
No ratings yet
unit-online-1.3
21 pages
Module 2 part2
No ratings yet
Module 2 part2
8 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
An Overview of Overfitting and Its Solutions
No ratings yet
An Overview of Overfitting and Its Solutions
7 pages
cst414-deep learning module 2
No ratings yet
cst414-deep learning module 2
13 pages
Module-4_4
No ratings yet
Module-4_4
19 pages
Deep Learning Basics Lecture 4 Regularization II
No ratings yet
Deep Learning Basics Lecture 4 Regularization II
27 pages
07_regularization
No ratings yet
07_regularization
51 pages
An Overview of Overfitting and Its Solutions
No ratings yet
An Overview of Overfitting and Its Solutions
7 pages
Deep Learning_Lecture 3_Regularization in Neural Networks
No ratings yet
Deep Learning_Lecture 3_Regularization in Neural Networks
16 pages
DL Class3
No ratings yet
DL Class3
28 pages
CNN Regularization
No ratings yet
CNN Regularization
9 pages
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
No ratings yet
Neural Networks For Machine Learning: Lecture 9a Overview of Ways To Improve Generalization
39 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
CST414 M2 Ktunotes.in
No ratings yet
CST414 M2 Ktunotes.in
30 pages
MAY DEEP LEARNING
No ratings yet
MAY DEEP LEARNING
16 pages
12-Regularization for Deep Learning-17!08!2024
No ratings yet
12-Regularization for Deep Learning-17!08!2024
51 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
DL UNIT 3
No ratings yet
DL UNIT 3
14 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Unit-2 L1 (3)
No ratings yet
Unit-2 L1 (3)
23 pages
Lecture 04 - Optimization - 4p
No ratings yet
Lecture 04 - Optimization - 4p
11 pages
Accelerated Bayesian Optimization For Deep Learning
No ratings yet
Accelerated Bayesian Optimization For Deep Learning
13 pages
LECTURE#9 EE258 F22 Part2 Draft v1
No ratings yet
LECTURE#9 EE258 F22 Part2 Draft v1
14 pages
Deep Learning Module-03 Search Creators
No ratings yet
Deep Learning Module-03 Search Creators
20 pages
Unit-2 L2 (3)
No ratings yet
Unit-2 L2 (3)
22 pages
4th Unit DL Final Class Notes (1)
No ratings yet
4th Unit DL Final Class Notes (1)
68 pages
DL mod 2
No ratings yet
DL mod 2
4 pages
Deep Learning 02
No ratings yet
Deep Learning 02
28 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
Unit 4 A
No ratings yet
Unit 4 A
16 pages
DOS - Report
No ratings yet
DOS - Report
25 pages
5m DL answers
No ratings yet
5m DL answers
12 pages
Classification Problem: Feedforwardnet Patternnet Fitnet
No ratings yet
Classification Problem: Feedforwardnet Patternnet Fitnet
16 pages
Unit -4-NNDL- Notes
No ratings yet
Unit -4-NNDL- Notes
14 pages
40 Machine Learning Algorithms
From Everand
40 Machine Learning Algorithms
Anam Giri
No ratings yet
Here Is A Timeline of Robot Generations
No ratings yet
Here Is A Timeline of Robot Generations
2 pages
5 Fuzzy Expert System
No ratings yet
5 Fuzzy Expert System
6 pages
CHAPTER-3 FINAL
No ratings yet
CHAPTER-3 FINAL
6 pages
TuringTest_Rules_v01
No ratings yet
TuringTest_Rules_v01
12 pages
Han_2023_J._Phys.__Conf._Ser._2460_012167
No ratings yet
Han_2023_J._Phys.__Conf._Ser._2460_012167
10 pages
AIML - 04 Single Layer Perceptron
No ratings yet
AIML - 04 Single Layer Perceptron
11 pages
Future Work & EnterpriseSystems
No ratings yet
Future Work & EnterpriseSystems
10 pages
History and Evolution of Artificial Intelligence
No ratings yet
History and Evolution of Artificial Intelligence
3 pages
A Comparative Study on Traffic Modeling Techniques for Predicting and Simulating Traffic Behavior Alghamdi
No ratings yet
A Comparative Study on Traffic Modeling Techniques for Predicting and Simulating Traffic Behavior Alghamdi
5 pages
Engineers' Day: IEI Celebrates
No ratings yet
Engineers' Day: IEI Celebrates
12 pages
SAP Procurement BWZ
No ratings yet
SAP Procurement BWZ
30 pages
2301.08243 I-Jepa (2023)
No ratings yet
2301.08243 I-Jepa (2023)
17 pages
Tech 12000 Syllabus Spring 2025 - S11
No ratings yet
Tech 12000 Syllabus Spring 2025 - S11
7 pages
465-Lecture 16 ViT
No ratings yet
465-Lecture 16 ViT
18 pages
CMPE 246 Lecture 20-(Mar 20) (1)
No ratings yet
CMPE 246 Lecture 20-(Mar 20) (1)
55 pages
Business idea presentation about Ai field
No ratings yet
Business idea presentation about Ai field
25 pages
6G Indicators of Value and Performance
No ratings yet
6G Indicators of Value and Performance
5 pages
CV Rafay
No ratings yet
CV Rafay
1 page
Webinar - Artificial Intelligence - Standardization Landscape - Tentative - Schedule
No ratings yet
Webinar - Artificial Intelligence - Standardization Landscape - Tentative - Schedule
2 pages
Computers
No ratings yet
Computers
3 pages
Richa PPT N
No ratings yet
Richa PPT N
21 pages
KTU BTech RB 2019scheme 2019Scheme-S8 2019 Syllabus
No ratings yet
KTU BTech RB 2019scheme 2019Scheme-S8 2019 Syllabus
155 pages
Seminar On Contemporary Issues Abhinav - 040712
No ratings yet
Seminar On Contemporary Issues Abhinav - 040712
22 pages
CS60010_Deep_NN.pptx (1) 2
No ratings yet
CS60010_Deep_NN.pptx (1) 2
50 pages
How Does AI Drive Branding
No ratings yet
How Does AI Drive Branding
17 pages
Nishu Project
No ratings yet
Nishu Project
12 pages
ML Supervised Learning Unit 3
No ratings yet
ML Supervised Learning Unit 3
51 pages
Week 2 Characterization of Learning Problems: Nptel Video Course On Machine Learning
No ratings yet
Week 2 Characterization of Learning Problems: Nptel Video Course On Machine Learning
18 pages
Rapp Et Al 2023 Adoption of Artificial Intelligence in Drinking Water Operations A Survey of Progress in The United
No ratings yet
Rapp Et Al 2023 Adoption of Artificial Intelligence in Drinking Water Operations A Survey of Progress in The United
7 pages

Unit-2 L3 (3)

Uploaded by

Unit-2 L3 (3)

Uploaded by

Regularization

• Two weights, Solid contour lines: contours of negative log-likelihood

• We can leverage this information via regularization

You might also like