Deep Learning_Lecture 3_Regularization in Neural Networks

The lecture discusses concepts of high bias and high variance in deep learning, highlighting overfitting and underfitting issues. It covers various regularization techniques such as L1 and L2 regularization, dropout, and data augmentation to improve model performance and prevent overfitting. Additionally, early stopping is introduced as a strategy to halt training when validation performance declines.

Uploaded by

hazemkotp14

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Deep Learning_Lecture 3_Regularization in Neural Networks

Uploaded by

hazemkotp14

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Lecture 3

Deep Learning
Presented by : Dr.Hanaa Bayomi
[email protected]
High Bias
High Variance

• Unnecessary Explanatory Variables might lead to Overfitting. Overfitting means that the
algorithm works well on the training set but is unable to perform better on the test sets. It is
also known as problem of High Variance.
• When the algorithm works so poorly that it is unable to fit even training set well then it is
said to Underfit the data. It is also known as problem of High Bias.
• In the following diagram we can see that fitting a linear regression in first case would
underfit the data i.e. it will lead to large errors even in the training set. Using a 11 polynomial
fit in second case is balanced i.e. such a fit can work on the training and test sets well, while
in third case the fit will lead to low errors in training set but it will not work well on the test
set.
Different Regularization Techniques in Deep Learning

• Regularization Techniques:
o L1 and L2 Regularization: Add penalty terms to the loss function to encourage smaller
weight values, reducing overfitting.
o Dropout: Randomly deactivate a fraction of neurons during training to prevent
coadaptation of neurons and encourage robustness.
• Early Stopping: Monitor the model's performance on a validation dataset
during training. Stop training when performance on the validation data starts to
degrade. This prevents the model from overfitting the training data.

• Data Augmentation: Apply data augmentation techniques to artificially

increase the size of the training dataset. Techniques may include rotating,
flipping, or adding noise to the data.
L2 & L1 regularization

L1 and L2 are the most common types of regularization. These update the
general cost function by adding another term known as the regularization
term.

•Cost function = Loss (say, binary cross entropy) + Regularization term

•L2 Regularization, also called a ridge regression, •L1 Regularization, also called a lasso regression,
adds the “squared magnitude” of the coefficient as the adds the “absolute value of magnitude” of the
penalty term to the loss function. coefficient as a penalty term to the loss function.
L2 & L1 regularization
Key differences between Lasso and Ridge regression include:
Sparsity vs. Shrinkage: Lasso tends to produce sparse solutions with many coefficients
set to zero, while Ridge regression produces solutions with small non-zero coefficients.

Feature Selection: Lasso implicitly performs feature selection by driving irrelevant

coefficients to zero, making it useful for selecting relevant features. Ridge regression
does not perform feature selection but rather reduces the impact of correlated features.

Solution Stability: Ridge regression tends to be more stable than Lasso when dealing
with multicollinearity because it does not arbitrarily exclude variables.

Computational Efficiency: Lasso tends to be more computationally expensive than Ridge

regression, especially for large datasets, due to its feature selection capabilities.
Dropout
▪ Dropout is a mechanism where at each training iteration (batch) we randomly remove a
subset of neurons
▪ This prevents the neural network from relying too much on individual pathways, making it
more “robust”
During Training:
• For each training example, dropout randomly sets a fraction
(typically between 0.2 and 0.5) of the units in a layer to zero.
This means that the outputs of these units are ignored during
forward propagation.
• The specific units to be dropped out are randomly chosen for
each training example and can change from one training
iteration to another.
• The remaining units in the layer that are not dropped out are
scaled by a factor of (1 / keep_prob), where keep_prob is the
probability of keeping a unit (1 - dropout rate).
Dropout—Visualization
Dropout—Visualization
If the neuron was present with probability p, at test time we scale the
outbound weights by a factor of p.
Data Augmentation
The simplest way to reduce overfitting is to increase the size of the training data.
In machine learning, we were not able to increase the size of training data as the
labeled data was too costly.
Image
there are a few ways of increasing the size of the training data – rotating the
image, flipping, scaling, shifting, etc. In the below image, some transformation
has been done on the handwritten digits' dataset.
Data Augmentation Text
Text translation: Translating text from one language to
another can be used as a data augmentation technique.
By translating the text, you can generate additional
examples while preserving the overall meaning and
context. This method can help improve the model's ability
to handle different sentence structures and expressions.
Synonym replacement: Replacing words or phrases with
their synonyms is a straightforward augmentation
technique. It involves replacing selected words with other
words having similar meanings. This method can
introduce variations in the text while maintaining the
overall semantics.
Word insertion: Inserting additional words into the text can increase its complexity and
diversity. This can be done by randomly selecting words or phrases and inserting them at
different positions within the sentence. Word insertion can help the model learn to handle
longer sentences and different word combinations.
Data Augmentation Text

Word deletion: Removing words from the text is another augmentation method. Randomly
deleting words can force the model to rely on the remaining context and improve its ability to
understand the meaning of the text even with missing information.

Text paraphrasing: Paraphrasing involves rewriting sentences or phrases while preserving their
original meaning. This technique can be applied by using external tools or algorithms that can
generate paraphrased versions of the input text. It helps create additional training examples
with different phrasing and sentence structures.

Text rotation: Text rotation involves changing the order of sentences within a document or
paragraphs within a text. By reordering the text, the model is exposed to different sentence or
paragraph arrangements, helping it learn to handle variations in document structure.
Early stopping

Early stopping is a kind of cross-

validation strategy where we keep one
part of the training set as the validation
set. When we see that the performance
on the validation set is getting worse, we
immediately stop the training on the
model. This is known as early stopping.

1k Instacart Drop
67% (3)
1k Instacart Drop
75 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Plant Simulation Step-By-Step ENU PDF
No ratings yet
Plant Simulation Step-By-Step ENU PDF
988 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
FreeBitcoin Script Roll 10000
100% (1)
FreeBitcoin Script Roll 10000
4 pages
Nndl Notes
No ratings yet
Nndl Notes
73 pages
Unit 4
No ratings yet
Unit 4
35 pages
Unit 4
No ratings yet
Unit 4
62 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
unit4
No ratings yet
unit4
93 pages
DL Notes
No ratings yet
DL Notes
16 pages
18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024
No ratings yet
18-Deep Learning Frameworks - Data Augmentation - Under-Fitting Vs Over-Fitting-22!08!2024
5 pages
DL Class3
No ratings yet
DL Class3
28 pages
2. Linear Regression, Polynomical, Gradiant Descent
No ratings yet
2. Linear Regression, Polynomical, Gradiant Descent
42 pages
Ensemble Method
No ratings yet
Ensemble Method
12 pages
Regularization
No ratings yet
Regularization
18 pages
Data Science Interview Question
No ratings yet
Data Science Interview Question
23 pages
What is Regularization.
No ratings yet
What is Regularization.
10 pages
Deep Learning - Summary - Deep - Learning
No ratings yet
Deep Learning - Summary - Deep - Learning
17 pages
Overfitting & Feature Engineering.pptx
No ratings yet
Overfitting & Feature Engineering.pptx
37 pages
Bias and Variance
No ratings yet
Bias and Variance
7 pages
DL Unit 3
No ratings yet
DL Unit 3
59 pages
unit 2 (1)
No ratings yet
unit 2 (1)
23 pages
Regularization Slides (2)
No ratings yet
Regularization Slides (2)
50 pages
Lecture 12 - Machine Learning
No ratings yet
Lecture 12 - Machine Learning
18 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
module 3 modified
No ratings yet
module 3 modified
48 pages
4. Regularization
No ratings yet
4. Regularization
19 pages
AI - W7L14
No ratings yet
AI - W7L14
22 pages
Variance and Bias
No ratings yet
Variance and Bias
14 pages
Machine Learning by Tom Mitchell - Definitions
No ratings yet
Machine Learning by Tom Mitchell - Definitions
12 pages
Bias Variance Tradeoff ML
No ratings yet
Bias Variance Tradeoff ML
2 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
9 pages
Unit Ii
No ratings yet
Unit Ii
8 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Interview Questions
100% (1)
Interview Questions
67 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Jkkklphftbbhuii
No ratings yet
Jkkklphftbbhuii
17 pages
Unit - 2 Deep Learning
No ratings yet
Unit - 2 Deep Learning
26 pages
Hyperparameter Tuning in DNNs
No ratings yet
Hyperparameter Tuning in DNNs
6 pages
DL mod 2
No ratings yet
DL mod 2
4 pages
Edab Module - 2
No ratings yet
Edab Module - 2
20 pages
Machine Learning Qs
No ratings yet
Machine Learning Qs
10 pages
DL_Unit-3
No ratings yet
DL_Unit-3
56 pages
Tuning Parameters
No ratings yet
Tuning Parameters
15 pages
Samatrix Assignment3
No ratings yet
Samatrix Assignment3
4 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
ML 21-22 Sem
No ratings yet
ML 21-22 Sem
10 pages
Bias Varience Trade Off
100% (2)
Bias Varience Trade Off
35 pages
Lec 3
No ratings yet
Lec 3
13 pages
UNIT 5
No ratings yet
UNIT 5
36 pages
machine learning-unit 3
No ratings yet
machine learning-unit 3
18 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
ML - WEEK 06
No ratings yet
ML - WEEK 06
31 pages
DL_Unit1 (1)
No ratings yet
DL_Unit1 (1)
79 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Convergence in Artificial Neural Networks (1)
No ratings yet
Convergence in Artificial Neural Networks (1)
11 pages
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
No ratings yet
Ensemble methods_b45145f8047e51ea0d65d32fc07eb528
21 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
68 pages
Regularization (mathematics) - Wikipedia
No ratings yet
Regularization (mathematics) - Wikipedia
13 pages
DL 4
No ratings yet
DL 4
15 pages
Abreviation Arbus
No ratings yet
Abreviation Arbus
120 pages
CMMI and SPICE Based Process Improvement
No ratings yet
CMMI and SPICE Based Process Improvement
4 pages
Building An E-Commerce Application With MEAN - Sample Chapter
100% (2)
Building An E-Commerce Application With MEAN - Sample Chapter
25 pages
MM PDF
No ratings yet
MM PDF
228 pages
PowerFlex 70 Adjustable Frequency AC Drive
No ratings yet
PowerFlex 70 Adjustable Frequency AC Drive
50 pages
House Judiciary Committee Discussion Draft
No ratings yet
House Judiciary Committee Discussion Draft
22 pages
Dist NGLLog
No ratings yet
Dist NGLLog
121 pages
Solution Math 5 Final
No ratings yet
Solution Math 5 Final
55 pages
Question Bank Transmission and Distribution
No ratings yet
Question Bank Transmission and Distribution
7 pages
SNMP FM-MIB OID Table 7.5.1-8.2.1-9.0.1
No ratings yet
SNMP FM-MIB OID Table 7.5.1-8.2.1-9.0.1
1 page
Drops
No ratings yet
Drops
2 pages
Asynchronous and Synchronous E
No ratings yet
Asynchronous and Synchronous E
9 pages
Syed Fasiudin Resume
No ratings yet
Syed Fasiudin Resume
5 pages
Biag: Ibm and Bi: Mouzzam Hussain Fuiems CSC - Vii
No ratings yet
Biag: Ibm and Bi: Mouzzam Hussain Fuiems CSC - Vii
40 pages
Texecom Premier Alarm Manual
No ratings yet
Texecom Premier Alarm Manual
132 pages
Master Writer Slave Receiver
No ratings yet
Master Writer Slave Receiver
3 pages
Accuset 1000 User Manual
No ratings yet
Accuset 1000 User Manual
2 pages
Computer Specs
No ratings yet
Computer Specs
29 pages
MN RC 1170 1270 - 0
No ratings yet
MN RC 1170 1270 - 0
94 pages
Website: Vce To PDF Converter: Facebook: Twitter:: Nse6 - Fml-6.2.Vceplus - Premium.Exam.30Q
No ratings yet
Website: Vce To PDF Converter: Facebook: Twitter:: Nse6 - Fml-6.2.Vceplus - Premium.Exam.30Q
19 pages
Class 4 First Three Months in School: Answer The Questions
No ratings yet
Class 4 First Three Months in School: Answer The Questions
10 pages
Resistor-Transistor Logic (RTL) - Electronics Club Digital Electronics
No ratings yet
Resistor-Transistor Logic (RTL) - Electronics Club Digital Electronics
4 pages
SDR
No ratings yet
SDR
3 pages
SITRAIN Training Cases
No ratings yet
SITRAIN Training Cases
18 pages
Fuzzy Control Passino and Yurkovich PDF
No ratings yet
Fuzzy Control Passino and Yurkovich PDF
79 pages
5th International Conference On NLP Techniques and Applications (NLPTA 2024)
No ratings yet
5th International Conference On NLP Techniques and Applications (NLPTA 2024)
2 pages
Geofencing On The Real-Time GPS Tracking System and Improving GPS Accuracy With Moving Average, Kalman Filter and Logistic Regression Analysis
No ratings yet
Geofencing On The Real-Time GPS Tracking System and Improving GPS Accuracy With Moving Average, Kalman Filter and Logistic Regression Analysis
7 pages