Deep Learning_Lecture 3_Regularization in Neural Networks
Deep Learning_Lecture 3_Regularization in Neural Networks
Deep Learning
Presented by : Dr.Hanaa Bayomi
[email protected]
High Bias
High Variance
• Unnecessary Explanatory Variables might lead to Overfitting. Overfitting means that the
algorithm works well on the training set but is unable to perform better on the test sets. It is
also known as problem of High Variance.
• When the algorithm works so poorly that it is unable to fit even training set well then it is
said to Underfit the data. It is also known as problem of High Bias.
• In the following diagram we can see that fitting a linear regression in first case would
underfit the data i.e. it will lead to large errors even in the training set. Using a 11 polynomial
fit in second case is balanced i.e. such a fit can work on the training and test sets well, while
in third case the fit will lead to low errors in training set but it will not work well on the test
set.
Different Regularization Techniques in Deep Learning
• Regularization Techniques:
o L1 and L2 Regularization: Add penalty terms to the loss function to encourage smaller
weight values, reducing overfitting.
o Dropout: Randomly deactivate a fraction of neurons during training to prevent
coadaptation of neurons and encourage robustness.
• Early Stopping: Monitor the model's performance on a validation dataset
during training. Stop training when performance on the validation data starts to
degrade. This prevents the model from overfitting the training data.
L1 and L2 are the most common types of regularization. These update the
general cost function by adding another term known as the regularization
term.
•L2 Regularization, also called a ridge regression, •L1 Regularization, also called a lasso regression,
adds the “squared magnitude” of the coefficient as the adds the “absolute value of magnitude” of the
penalty term to the loss function. coefficient as a penalty term to the loss function.
L2 & L1 regularization
Key differences between Lasso and Ridge regression include:
Sparsity vs. Shrinkage: Lasso tends to produce sparse solutions with many coefficients
set to zero, while Ridge regression produces solutions with small non-zero coefficients.
Solution Stability: Ridge regression tends to be more stable than Lasso when dealing
with multicollinearity because it does not arbitrarily exclude variables.
Word deletion: Removing words from the text is another augmentation method. Randomly
deleting words can force the model to rely on the remaining context and improve its ability to
understand the meaning of the text even with missing information.
Text paraphrasing: Paraphrasing involves rewriting sentences or phrases while preserving their
original meaning. This technique can be applied by using external tools or algorithms that can
generate paraphrased versions of the input text. It helps create additional training examples
with different phrasing and sentence structures.
Text rotation: Text rotation involves changing the order of sentences within a document or
paragraphs within a text. By reordering the text, the model is exposed to different sentence or
paragraph arrangements, helping it learn to handle variations in document structure.
Early stopping