0% found this document useful (0 votes)
4 views

Deep Learning_Lecture 3_Regularization in Neural Networks

The lecture discusses concepts of high bias and high variance in deep learning, highlighting overfitting and underfitting issues. It covers various regularization techniques such as L1 and L2 regularization, dropout, and data augmentation to improve model performance and prevent overfitting. Additionally, early stopping is introduced as a strategy to halt training when validation performance declines.

Uploaded by

hazemkotp14
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Deep Learning_Lecture 3_Regularization in Neural Networks

The lecture discusses concepts of high bias and high variance in deep learning, highlighting overfitting and underfitting issues. It covers various regularization techniques such as L1 and L2 regularization, dropout, and data augmentation to improve model performance and prevent overfitting. Additionally, early stopping is introduced as a strategy to halt training when validation performance declines.

Uploaded by

hazemkotp14
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Lecture 3

Deep Learning
Presented by : Dr.Hanaa Bayomi
[email protected]
High Bias
High Variance

• Unnecessary Explanatory Variables might lead to Overfitting. Overfitting means that the
algorithm works well on the training set but is unable to perform better on the test sets. It is
also known as problem of High Variance.
• When the algorithm works so poorly that it is unable to fit even training set well then it is
said to Underfit the data. It is also known as problem of High Bias.
• In the following diagram we can see that fitting a linear regression in first case would
underfit the data i.e. it will lead to large errors even in the training set. Using a 11 polynomial
fit in second case is balanced i.e. such a fit can work on the training and test sets well, while
in third case the fit will lead to low errors in training set but it will not work well on the test
set.
Different Regularization Techniques in Deep Learning

• Regularization Techniques:
o L1 and L2 Regularization: Add penalty terms to the loss function to encourage smaller
weight values, reducing overfitting.
o Dropout: Randomly deactivate a fraction of neurons during training to prevent
coadaptation of neurons and encourage robustness.
• Early Stopping: Monitor the model's performance on a validation dataset
during training. Stop training when performance on the validation data starts to
degrade. This prevents the model from overfitting the training data.

• Data Augmentation: Apply data augmentation techniques to artificially


increase the size of the training dataset. Techniques may include rotating,
flipping, or adding noise to the data.
L2 & L1 regularization

L1 and L2 are the most common types of regularization. These update the
general cost function by adding another term known as the regularization
term.

•Cost function = Loss (say, binary cross entropy) + Regularization term

•L2 Regularization, also called a ridge regression, •L1 Regularization, also called a lasso regression,
adds the “squared magnitude” of the coefficient as the adds the “absolute value of magnitude” of the
penalty term to the loss function. coefficient as a penalty term to the loss function.
L2 & L1 regularization
Key differences between Lasso and Ridge regression include:
Sparsity vs. Shrinkage: Lasso tends to produce sparse solutions with many coefficients
set to zero, while Ridge regression produces solutions with small non-zero coefficients.

Feature Selection: Lasso implicitly performs feature selection by driving irrelevant


coefficients to zero, making it useful for selecting relevant features. Ridge regression
does not perform feature selection but rather reduces the impact of correlated features.

Solution Stability: Ridge regression tends to be more stable than Lasso when dealing
with multicollinearity because it does not arbitrarily exclude variables.

Computational Efficiency: Lasso tends to be more computationally expensive than Ridge


regression, especially for large datasets, due to its feature selection capabilities.
Dropout
▪ Dropout is a mechanism where at each training iteration (batch) we randomly remove a
subset of neurons
▪ This prevents the neural network from relying too much on individual pathways, making it
more “robust”
During Training:
• For each training example, dropout randomly sets a fraction
(typically between 0.2 and 0.5) of the units in a layer to zero.
This means that the outputs of these units are ignored during
forward propagation.
• The specific units to be dropped out are randomly chosen for
each training example and can change from one training
iteration to another.
• The remaining units in the layer that are not dropped out are
scaled by a factor of (1 / keep_prob), where keep_prob is the
probability of keeping a unit (1 - dropout rate).
Dropout—Visualization
Dropout—Visualization
If the neuron was present with probability p, at test time we scale the
outbound weights by a factor of p.
Data Augmentation
The simplest way to reduce overfitting is to increase the size of the training data.
In machine learning, we were not able to increase the size of training data as the
labeled data was too costly.
Image
there are a few ways of increasing the size of the training data – rotating the
image, flipping, scaling, shifting, etc. In the below image, some transformation
has been done on the handwritten digits' dataset.
Data Augmentation Text
Text translation: Translating text from one language to
another can be used as a data augmentation technique.
By translating the text, you can generate additional
examples while preserving the overall meaning and
context. This method can help improve the model's ability
to handle different sentence structures and expressions.
Synonym replacement: Replacing words or phrases with
their synonyms is a straightforward augmentation
technique. It involves replacing selected words with other
words having similar meanings. This method can
introduce variations in the text while maintaining the
overall semantics.
Word insertion: Inserting additional words into the text can increase its complexity and
diversity. This can be done by randomly selecting words or phrases and inserting them at
different positions within the sentence. Word insertion can help the model learn to handle
longer sentences and different word combinations.
Data Augmentation Text

Word deletion: Removing words from the text is another augmentation method. Randomly
deleting words can force the model to rely on the remaining context and improve its ability to
understand the meaning of the text even with missing information.

Text paraphrasing: Paraphrasing involves rewriting sentences or phrases while preserving their
original meaning. This technique can be applied by using external tools or algorithms that can
generate paraphrased versions of the input text. It helps create additional training examples
with different phrasing and sentence structures.

Text rotation: Text rotation involves changing the order of sentences within a document or
paragraphs within a text. By reordering the text, the model is exposed to different sentence or
paragraph arrangements, helping it learn to handle variations in document structure.
Early stopping

Early stopping is a kind of cross-


validation strategy where we keep one
part of the training set as the validation
set. When we see that the performance
on the validation set is getting worse, we
immediately stop the training on the
model. This is known as early stopping.

You might also like