0% found this document useful (0 votes)

6 views17 pages

Kkk

The document discusses the overfitting problem in machine learning, outlining its causes and providing strategies to prevent it, such as regularization techniques (L1, L2, Elastic), dropout, and early stopping. It explains how overfitting occurs due to small training data, noise, and model complexity, and emphasizes the importance of feature selection and data augmentation. Additionally, it details the mathematical foundations and applications of different regularization methods to improve model generalization.

Uploaded by

tonystark369845

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views17 pages

Kkk

Uploaded by

tonystark369845

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

UNIT - 3

▪ Overfitting Problem

▪ Regularization (Ridge, Lasso, Elastic)

▪ Dropout and Early Stopping

Dr. Tarandeep Singh 21 February 2025 23

Overfitting Problem

Why does overfitting occur?

➢ The training data size is too small and does not contain enough data samples to accurately
represent all possible input data values.
➢ The training data contains large amounts of irrelevant information, called noisy data.
➢ The model trains for too long on a single sample set of data.
➢ The model complexity is high, so it learns the noise within the training data.

Dr. Tarandeep Singh 21 February 2025 24

Overfitting Problem

Overfitting Examples
Consider a use case where a machine learning model has to analyze photos and identify the ones that contain dogs in
them. If the machine learning model was trained on a data set that contained majority photos showing dogs outside in
parks , it may may learn to use grass as a feature for classification and may not recognize a dog inside a room.

Dr. Tarandeep Singh 21 February 2025 25

Overfitting Problem

How to prevent overfitting?

You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:

Early stopping
Early stopping pauses the training phase before the machine learning model learns the noise in the data. However,
getting the timing right is important; else the model will still not give accurate results.

Pruning
You might identify several features or parameters that impact the final prediction when you build a model. Feature
selection—or pruning—identifies the most important features within the training set and eliminates irrelevant ones. For
example, to predict if an image is an animal or human, you can look at various input parameters like face shape, ear
position, body structure, etc. You may prioritize face shape and ignore the shape of the eyes.

Dr. Tarandeep Singh 21 February 2025 26

Overfitting Problem

How to prevent overfitting?

You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:

Regularization (Ridge/Lasso/Elastic/Dropout)
Regularization is a collection of training/optimization techniques that seek to reduce overfitting. These methods try to
eliminate those factors that do not impact the prediction outcomes by grading features based on importance. For
example, mathematical calculations apply a penalty value to features with minimal impact.

Ensembling
Ensembling combines predictions from several separate machine learning algorithms. Some models are called weak
learners because their results are often inaccurate. Ensemble methods combine all the weak learners to get more
accurate results. The two main ensemble methods are bagging and boosting. Boosting trains different machine learning
models one after another to get the final result, while bagging trains them in parallel.

Dr. Tarandeep Singh 21 February 2025 27

Overfitting Problem

How to prevent overfitting?

You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:
Lasso Regularization/ L1 regularization
➢ It stands for Least Absolute and Selection Operator.
➢ It added the penalty term contains only the “absolute weights” .
➢ Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge Regression can only shrink it near
to 0.

So, our L1 regularization technique would assign the less signification feature with a zero weight, if it doesn’t
have a significant effect on the prediction of our target column.

Dr. Tarandeep Singh 21 February 2025 28

Overfitting Problem

How to prevent overfitting?

You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:
Ridge Regularization/L2 regularization
➢ A Ridge regressor is basically a regularized version of a Linear Regressor. i.e to the original cost function of linear
regressor, we add a regularized term that forces the learning algorithm to fit the data and helps to keep the
weights lower as possible.
➢ Adds “squared magnitude)” of coefficient (square of weights) as penalty term to the loss function. Here
the highlighted outline part represents L2 regularization element.

❖ If lambda is zero, then it get back to OLS (ordinary linear regression).

❖ If lambda is very large, then it will add too much weight and it will lead to under-fitting.
“It’s important how lambda is chosen. This technique works very well to avoid over-fitting issue”.

Dr. Tarandeep Singh 21 February 2025 29

L1 Regularisation

Dr. Tarandeep Singh 21 February 2025 30

L2 Regularisation

Dr. Tarandeep Singh 21 February 2025 31

Out of the Box Question

L1 and L2 Regularization. Why are they named so?

➢ The names "L1" and "L2" come from the norm used to calculate the regularization term.

➢ In L1 regularization, the norm used is the L1 norm, which is the sum of the absolute values of the

elements.

➢ In L2 regularization, the norm used is the L2 norm, which is the square root of the sum of the squared

values of the elements.

➢ These norms are commonly used in mathematics, and they determine the type of regularization

applied to the model.

Overfitting Problem

How to prevent overfitting?

You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:
Elastic Regularization (Combination of both Ridge and Lasso)
➢ Elastic regularization is a combination of both L1 (Lasso) and L2 (Ridge) regularization techniques.
➢ Elastic adds both L1 and L2 penalty terms alpha (α) and lambda (λ) to the loss function during training, allowing
for a more flexible and balanced approach to regularization.
➢ It address some of the limitations of each individual regularization method.

Elastic Net is particularly useful when dealing with high-

dimensional datasets or datasets where many features may be
correlated.

Dr. Tarandeep Singh 21 February 2025 33

How the penalty is calculated in any cost function

➢ In L1 and L2 regularization, the penalty term (often denoted as α or λ, depending on notation) is a

hyperparameter need to specify before training the model. It controls the strength of regularization.

➢ The larger the value of α or λ, the stronger the regularization effect.

We can typically choose and calculate the penalty term for L1 and L2 regularization:

L1 Regularization (Lasso):

In L1 regularization, it aims to minimize the defined cost function:

Cost(L1) = Cost(Original) + λ * Σ|θ_i|

➢ λ is the regularization parameter for L1 regularization.

➢ You need to choose an appropriate value for λ through techniques like cross-validation or grid search.
➢ Common values to try for λ include a range of positive values, often in a logarithmic scale (e.g., 0.001, 0.01,
0.1, 1, 10).

Dr. Tarandeep Singh 21 February 2025 34

How the penalty is calculated in any cost function
L2 Regularization (Ridge):
In L2 regularization, it aims to minimize the following cost function:

Cost(L2) = Cost(Original) + λ * Σ(θ_i^2)

➢ λ is the regularization parameter for L2 regularization.

➢ We need to choose an appropriate value for λ through techniques like cross-validation or grid search.
➢ Common values for λ in L2 regularization also include a range of positive values, often in a logarithmic
scale.
Keep in mind that the appropriate value of γ can vary from one problem to
another, so it's essential to experiment with different values to find the one
that balances model complexity (number and magnitude of coefficients) and
Selection of γ: model accuracy effectively.

➢ The choice of λ depends on your specific problem and dataset.

➢ You typically start with a broad search over a range of λ values, and you can use techniques like k-fold
cross-validation to evaluate how well your model performs for each value of λ.
➢ The value of λ that results in the best model performance (e.g., lowest validation error) is typically
chosen.

Dr. Tarandeep Singh 21 February 2025 35

Overfitting Problem

How to prevent overfitting?

You can prevent overfitting by diversifying and scaling your training data set or using some strategies like:

Dr. Tarandeep Singh 21 February 2025 36

NOTES : Regularisation Techniques

Regularization is a set of techniques used in machine learning and statistical modeling to prevent overfitting and improve the
generalization performance of a model.
Regularization methods introduce a penalty term into the model's objective function, encouraging it to have simpler and more
stable patterns that generalize better to new data.

There are several common types of regularization techniques used in machine learning, including:

L1 Regularization (Lasso): L1 regularization adds a penalty term to the model's objective function that is proportional to the
absolute values of the model's coefficients. It encourages the model to have sparse feature weights, effectively selecting a subset
of the most important features while setting others to zero. L1 regularization is useful for feature selection.

L2 Regularization (Ridge): L2 regularization adds a penalty term to the objective function that is proportional to the squared
values of the model's coefficients. It encourages the model to have small, evenly distributed feature weights. L2 regularization
helps prevent large coefficients that might lead to overfitting.

Elastic Net Regularization: Elastic Net is a combination of L1 and L2 regularization. It adds both L1 and L2 penalty terms to
the objective function, allowing for feature selection and coefficient shrinkage. Elastic Net is useful when there are many
features, and some of them are highly correlated.

Dr. Tarandeep Singh 21 February 2025 37

NOTES : Regularisation Techniques
Data Augmentation: Data augmentation is a
Dropout (for Neural Networks): Dropout is a regularization technique used primarily in computer
regularization technique specifically used in neural vision. It involves generating new training examples by
networks. During training, dropout randomly deactivates applying various transformations (e.g., rotation,
a fraction of neurons (units) in a neural network, translation, cropping) to the existing data. This helps the
effectively preventing the network from relying too model generalize better by exposing it to more diverse
heavily on any one neuron or feature. This encourages variations of the data.
the network to learn more robust and generalizable
representations. Pruning (for Decision Trees): Pruning is a regularization
technique for decision trees. It involves removing branches
Early Stopping: Early stopping is a regularization or nodes from a decision tree that do not significantly
technique that stops the training process when the improve the model's performance. Pruning prevents the
model's performance on a validation dataset starts to tree from becoming too deep and complex.
degrade. It prevents the model from continuing to learn
❑ The choice of regularization technique and the strength of
the noise in the training data.
regularization (controlled by hyperparameters like
regularization strength or dropout rate) depend on the specific
problem and the characteristics of the data.
❑ Regularization is a crucial tool for achieving better model
performance and preventing overfitting in various machine
learning algorithms.

Dr. Tarandeep Singh 21 February 2025 38

References:
https://www.edureka.co/blog/regularization-in-machine-learning/
https://www.dataquest.io/blog/regularization-in-machine-learning/

https://www.mygreatlearning.com/blog/what-is-ridge-regression/
https://neptune.ai/blog/fighting-overfitting-with-l1-or-l2-regularization

Motor Winding Procedure
100% (1)
Motor Winding Procedure
25 pages
? What is Overfitting
No ratings yet
? What is Overfitting
2 pages
Nndl Notes
No ratings yet
Nndl Notes
73 pages
Regularization
No ratings yet
Regularization
46 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
3 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
No ratings yet
Overfitting Underfitting: UNIT 2: Optimization and Regularization in Neural Networks
18 pages
Unit 4
No ratings yet
Unit 4
62 pages
unit4
No ratings yet
unit4
93 pages
DL_Unit-3
No ratings yet
DL_Unit-3
56 pages
Regularization 1704650055
No ratings yet
Regularization 1704650055
32 pages
Unit 4
No ratings yet
Unit 4
35 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
5 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
4.Bias and Variance
No ratings yet
4.Bias and Variance
19 pages
12-Regularization for Deep Learning-17!08!2024
No ratings yet
12-Regularization for Deep Learning-17!08!2024
51 pages
Lec 05 Regularization
No ratings yet
Lec 05 Regularization
77 pages
What is Regularization.
No ratings yet
What is Regularization.
10 pages
UNIT LV
No ratings yet
UNIT LV
8 pages
Regularization in Deep Learning (1)
No ratings yet
Regularization in Deep Learning (1)
49 pages
Regularization
No ratings yet
Regularization
3 pages
ML Lec-8
No ratings yet
ML Lec-8
7 pages
Unit 2.3
No ratings yet
Unit 2.3
43 pages
Regularization
No ratings yet
Regularization
4 pages
1740485349670
No ratings yet
1740485349670
21 pages
DL Chpter 3
No ratings yet
DL Chpter 3
8 pages
Lec9_10 (1)
No ratings yet
Lec9_10 (1)
4 pages
Regularization in Machine Learning
No ratings yet
Regularization in Machine Learning
17 pages
07: Regularization: The Problem of Overfitting
No ratings yet
07: Regularization: The Problem of Overfitting
5 pages
Overfitting vs Underfitting
No ratings yet
Overfitting vs Underfitting
16 pages
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
No ratings yet
BACK PROPAGATION and REGULATION, BATCH NORMALIZATION
20 pages
07 Regularization
No ratings yet
07 Regularization
7 pages
Lecture 4.2. Generalization and Regularization
No ratings yet
Lecture 4.2. Generalization and Regularization
23 pages
L11+ Regularization
No ratings yet
L11+ Regularization
24 pages
Module-4_3
No ratings yet
Module-4_3
20 pages
DIP Assignment Essajan
No ratings yet
DIP Assignment Essajan
2 pages
02 - Linear Models - C - Regularization - Logistic - Regression
No ratings yet
02 - Linear Models - C - Regularization - Logistic - Regression
16 pages
mod4
No ratings yet
mod4
65 pages
Regularization
No ratings yet
Regularization
2 pages
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
No ratings yet
5-Introduction To regularization-03-Aug-2020Material - I - 03-Aug-2020 - Module3 - Regularization
10 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
No ratings yet
Regularization For Deep Learning: Tsz-Chiu Au Chiu@unist - Ac.kr
100 pages
Overfitting and Solution Sovlve
No ratings yet
Overfitting and Solution Sovlve
3 pages
Samatrix Assignment3
No ratings yet
Samatrix Assignment3
4 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
Interview Questions Data Science
No ratings yet
Interview Questions Data Science
4 pages
L11+ Regularization
No ratings yet
L11+ Regularization
25 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Regularization
No ratings yet
Regularization
45 pages
DL Unit 4
No ratings yet
DL Unit 4
15 pages
07_regularization
No ratings yet
07_regularization
51 pages
Regularization Slides (2)
No ratings yet
Regularization Slides (2)
50 pages
Unit - 4 REGULARIZATION FOR DEEP LEARNING
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
56 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
pr
No ratings yet
pr
6 pages
Machine Learning by Tom Mitchell - Definitions
No ratings yet
Machine Learning by Tom Mitchell - Definitions
12 pages
Lec-W12 Overfitting
No ratings yet
Lec-W12 Overfitting
30 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Optimization Algorithms and Hierarchical Convergence
From Everand
Optimization Algorithms and Hierarchical Convergence
Pasquale De Marco
No ratings yet
73 Powerful Coaching Questions To Ask Your Clients
No ratings yet
73 Powerful Coaching Questions To Ask Your Clients
36 pages
ANTICORIT
No ratings yet
ANTICORIT
8 pages
Effect of Dolomite in Cement Strength
No ratings yet
Effect of Dolomite in Cement Strength
8 pages
Class 8 Ai - Answer Keys - PDF - Artificial Intelligence - Intelligence (AI) & Semantics
No ratings yet
Class 8 Ai - Answer Keys - PDF - Artificial Intelligence - Intelligence (AI) & Semantics
18 pages
IB MYP - Year 1 - SA2 Syllabus
No ratings yet
IB MYP - Year 1 - SA2 Syllabus
3 pages
Sleep Deprivation and Academic Performance
0% (1)
Sleep Deprivation and Academic Performance
33 pages
Assignment 3A Isha
No ratings yet
Assignment 3A Isha
2 pages
Galvalok INGLES
No ratings yet
Galvalok INGLES
8 pages
Bracing Connection
No ratings yet
Bracing Connection
6 pages
FST4000 IM-L2014-E00 10th Oct2020E
No ratings yet
FST4000 IM-L2014-E00 10th Oct2020E
30 pages
The Curious Case of Benjamin Button Movie Analysis - Kamonwan 1003
No ratings yet
The Curious Case of Benjamin Button Movie Analysis - Kamonwan 1003
5 pages
Inverters
No ratings yet
Inverters
13 pages
Understanding The Concepts of Phonology PDF
No ratings yet
Understanding The Concepts of Phonology PDF
7 pages
Stanje Hrčkov V Evropě (Eng.)
No ratings yet
Stanje Hrčkov V Evropě (Eng.)
73 pages
Electronic Devices 9th Edition - CHP 3 Basic Problems
No ratings yet
Electronic Devices 9th Edition - CHP 3 Basic Problems
4 pages
Practical Research 1: Quarter 1-Module 2: Qualitative Research and Its Importance in Daily Life
No ratings yet
Practical Research 1: Quarter 1-Module 2: Qualitative Research and Its Importance in Daily Life
12 pages
Stainless Steel 310 Refractory Anchors
No ratings yet
Stainless Steel 310 Refractory Anchors
3 pages
Savai Education - Full Course One Year
No ratings yet
Savai Education - Full Course One Year
2 pages
OTK - AZ Buraxılış 2024 ( 4) - Sinif 11
No ratings yet
OTK - AZ Buraxılış 2024 ( 4) - Sinif 11
3 pages
Litsgard M. The Orbit Method and Geometric Quantisation (Uppsala, 2018) (56s)
No ratings yet
Litsgard M. The Orbit Method and Geometric Quantisation (Uppsala, 2018) (56s)
56 pages
Advanced Leadership-Insight Paper - Blake and Mourton Leadership Grid
No ratings yet
Advanced Leadership-Insight Paper - Blake and Mourton Leadership Grid
2 pages
PMG 321-Module 4 Paper
No ratings yet
PMG 321-Module 4 Paper
6 pages
Giz2021 en Kazakhstan Policy Brief Agriculture
No ratings yet
Giz2021 en Kazakhstan Policy Brief Agriculture
10 pages
Vdoc - Pub Nonindifferent Nature Film and The Structure of Things
No ratings yet
Vdoc - Pub Nonindifferent Nature Film and The Structure of Things
227 pages
SASL: North Africa: Suggestions For
No ratings yet
SASL: North Africa: Suggestions For
4 pages
A Review On Multi-Label Learning Algorithms: IEEE Transactions On Knowledge and Data Engineering August 2014
No ratings yet
A Review On Multi-Label Learning Algorithms: IEEE Transactions On Knowledge and Data Engineering August 2014
43 pages
Comparative Evaluation of Effect of Denture Adhesives On Retention of
No ratings yet
Comparative Evaluation of Effect of Denture Adhesives On Retention of
6 pages
Market Structure MS
100% (2)
Market Structure MS
32 pages
Patient Safety in Medical Imaging
No ratings yet
Patient Safety in Medical Imaging
13 pages